Início Rápido: Assistente Criar uma voz personalizada

Artigo
02/24/2024

Neste guia de início rápido, você usará o SDK de Fala para criar um aplicativo de assistente de voz personalizado que se conecta a um bot que você já criou e configurou. Se você precisar criar um bot, confira o tutorial relacionado para ver orientações mais abrangentes.

Depois de atender a alguns pré-requisitos, conectar o assistente de voz personalizado leva apenas algumas etapas:

Criar um objeto BotFrameworkConfig na chave de assinatura e na região.
Criar um objeto DialogServiceConnector usando o objeto BotFrameworkConfig acima.
Com o objeto DialogServiceConnector, inicie o processo de escuta para um único enunciado.
Inspecione o ActivityReceivedEventArgs retornado.

Observação

O SDK de Fala para C++, JavaScript, Objective-C, Python e Swift dá suporte a assistentes de voz personalizados, mas ainda não incluímos um guia a respeito.

Veja ou baixe todas as Amostras em C# do SDK de Fala no GitHub.

Pré-requisitos

Antes de começar, é preciso:

Criar um recurso de Fala
Configurar seu ambiente de desenvolvimento e criar um projeto vazio
Criar um bot conectado ao canal de Fala do Direct Line
Verificar se você tem acesso a um microfone para captura de áudio

Observação

Confira a lista de regiões compatíveis com assistentes de voz e implante seus recursos em uma dessas regiões.

Abra o projeto no Visual Studio

A primeira etapa é verificar se o projeto está aberto no Visual Studio.

Comece com código de texto clichê

Vamos adicionar um código que funciona como um esqueleto para o projeto.

No Gerenciador de Soluções, abra MainPage.xaml.

Na exibição XAML do designer, substitua todo o conteúdo pelo seguinte snippet que define uma interface do usuário rudimentar:

<Page
    x:Class="helloworld.MainPage"
    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
    xmlns:local="using:helloworld"
    xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    mc:Ignorable="d"
    Background="{ThemeResource ApplicationPageBackgroundThemeBrush}">

    <Grid>
        <StackPanel Orientation="Vertical" HorizontalAlignment="Center"  
                    Margin="20,50,0,0" VerticalAlignment="Center" Width="800">
            <Button x:Name="EnableMicrophoneButton" Content="Enable Microphone"  
                    Margin="0,0,10,0" Click="EnableMicrophone_ButtonClicked" 
                    Height="35"/>
            <Button x:Name="ListenButton" Content="Talk to your bot" 
                    Margin="0,10,10,0" Click="ListenButton_ButtonClicked" 
                    Height="35"/>
            <StackPanel x:Name="StatusPanel" Orientation="Vertical" 
                        RelativePanel.AlignBottomWithPanel="True" 
                        RelativePanel.AlignRightWithPanel="True" 
                        RelativePanel.AlignLeftWithPanel="True">
                <TextBlock x:Name="StatusLabel" Margin="0,10,10,0" 
                           TextWrapping="Wrap" Text="Status:" FontSize="20"/>
                <Border x:Name="StatusBorder" Margin="0,0,0,0">
                    <ScrollViewer VerticalScrollMode="Auto"  
                                  VerticalScrollBarVisibility="Auto" MaxHeight="200">
                        <!-- Use LiveSetting to enable screen readers to announce 
                             the status update. -->
                        <TextBlock 
                            x:Name="StatusBlock" FontWeight="Bold" 
                            AutomationProperties.LiveSetting="Assertive"
                            MaxWidth="{Binding ElementName=Splitter, Path=ActualWidth}" 
                            Margin="10,10,10,20" TextWrapping="Wrap"  />
                    </ScrollViewer>
                </Border>
            </StackPanel>
        </StackPanel>
        <MediaElement x:Name="mediaElement"/>
    </Grid>
</Page>

O modo de exibição de Design é atualizado para mostrar a interface do usuário do aplicativo.

No Gerenciador de Soluções, abra o arquivo de origem code-behind MainPage.xaml.cs. (Ele está agrupado em MainPage.xaml.) Substitua o conteúdo deste arquivo pelo seguinte, que inclui:

using instruções para os namespaces Speech e Speech.Dialog
Uma implementação simples para garantir o acesso ao microfone, conectado a um manipulador de botão
Auxiliares básicos de interface do usuário para apresentar erros e mensagens no aplicativo
Um ponto inicial para o caminho do código de inicialização que será populado posteriormente
Um auxiliar para executar a conversão de texto em fala (sem suporte para streaming)

Um manipulador de botão vazio para iniciar a escuta que será populado mais tarde

using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using Microsoft.CognitiveServices.Speech.Dialog;
using System;
using System.Diagnostics;
using System.IO;
using System.Text;
using Windows.Foundation;
using Windows.Storage.Streams;
using Windows.UI.Xaml;
using Windows.UI.Xaml.Controls;
using Windows.UI.Xaml.Media;

namespace helloworld
{
    public sealed partial class MainPage : Page
    {
        private DialogServiceConnector connector;

        private enum NotifyType
        {
            StatusMessage,
            ErrorMessage
        };

        public MainPage()
        {
            this.InitializeComponent();
        }

        private async void EnableMicrophone_ButtonClicked(
            object sender, RoutedEventArgs e)
        {
            bool isMicAvailable = true;
            try
            {
                var mediaCapture = new Windows.Media.Capture.MediaCapture();
                var settings = 
                    new Windows.Media.Capture.MediaCaptureInitializationSettings();
                settings.StreamingCaptureMode = 
                    Windows.Media.Capture.StreamingCaptureMode.Audio;
                await mediaCapture.InitializeAsync(settings);
            }
            catch (Exception)
            {
                isMicAvailable = false;
            }
            if (!isMicAvailable)
            {
                await Windows.System.Launcher.LaunchUriAsync(
                    new Uri("ms-settings:privacy-microphone"));
            }
            else
            {
                NotifyUser("Microphone was enabled", NotifyType.StatusMessage);
            }
        }

        private void NotifyUser(
            string strMessage, NotifyType type = NotifyType.StatusMessage)
        {
            // If called from the UI thread, then update immediately.
            // Otherwise, schedule a task on the UI thread to perform the update.
            if (Dispatcher.HasThreadAccess)
            {
                UpdateStatus(strMessage, type);
            }
            else
            {
                var task = Dispatcher.RunAsync(
                    Windows.UI.Core.CoreDispatcherPriority.Normal, 
                    () => UpdateStatus(strMessage, type));
            }
        }

        private void UpdateStatus(string strMessage, NotifyType type)
        {
            switch (type)
            {
                case NotifyType.StatusMessage:
                    StatusBorder.Background = new SolidColorBrush(
                        Windows.UI.Colors.Green);
                    break;
                case NotifyType.ErrorMessage:
                    StatusBorder.Background = new SolidColorBrush(
                        Windows.UI.Colors.Red);
                    break;
            }
            StatusBlock.Text += string.IsNullOrEmpty(StatusBlock.Text) 
                ? strMessage : "\n" + strMessage;

            if (!string.IsNullOrEmpty(StatusBlock.Text))
            {
                StatusBorder.Visibility = Visibility.Visible;
                StatusPanel.Visibility = Visibility.Visible;
            }
            else
            {
                StatusBorder.Visibility = Visibility.Collapsed;
                StatusPanel.Visibility = Visibility.Collapsed;
            }
            // Raise an event if necessary to enable a screen reader 
            // to announce the status update.
            var peer = Windows.UI.Xaml.Automation.Peers.FrameworkElementAutomationPeer.FromElement(StatusBlock);
            if (peer != null)
            {
                peer.RaiseAutomationEvent(
                    Windows.UI.Xaml.Automation.Peers.AutomationEvents.LiveRegionChanged);
            }
        }

        // Waits for and accumulates all audio associated with a given 
        // PullAudioOutputStream and then plays it to the MediaElement. Long spoken 
        // audio will create extra latency and a streaming playback solution 
        // (that plays audio while it continues to be received) should be used -- 
        // see the samples for examples of this.
        private void SynchronouslyPlayActivityAudio(
            PullAudioOutputStream activityAudio)
        {
            var playbackStreamWithHeader = new MemoryStream();
            playbackStreamWithHeader.Write(Encoding.ASCII.GetBytes("RIFF"), 0, 4); // ChunkID
            playbackStreamWithHeader.Write(BitConverter.GetBytes(UInt32.MaxValue), 0, 4); // ChunkSize: max
            playbackStreamWithHeader.Write(Encoding.ASCII.GetBytes("WAVE"), 0, 4); // Format
            playbackStreamWithHeader.Write(Encoding.ASCII.GetBytes("fmt "), 0, 4); // Subchunk1ID
            playbackStreamWithHeader.Write(BitConverter.GetBytes(16), 0, 4); // Subchunk1Size: PCM
            playbackStreamWithHeader.Write(BitConverter.GetBytes(1), 0, 2); // AudioFormat: PCM
            playbackStreamWithHeader.Write(BitConverter.GetBytes(1), 0, 2); // NumChannels: mono
            playbackStreamWithHeader.Write(BitConverter.GetBytes(16000), 0, 4); // SampleRate: 16kHz
            playbackStreamWithHeader.Write(BitConverter.GetBytes(32000), 0, 4); // ByteRate
            playbackStreamWithHeader.Write(BitConverter.GetBytes(2), 0, 2); // BlockAlign
            playbackStreamWithHeader.Write(BitConverter.GetBytes(16), 0, 2); // BitsPerSample: 16-bit
            playbackStreamWithHeader.Write(Encoding.ASCII.GetBytes("data"), 0, 4); // Subchunk2ID
            playbackStreamWithHeader.Write(BitConverter.GetBytes(UInt32.MaxValue), 0, 4); // Subchunk2Size

            byte[] pullBuffer = new byte[2056];

            uint lastRead = 0;
            do
            {
                lastRead = activityAudio.Read(pullBuffer);
                playbackStreamWithHeader.Write(pullBuffer, 0, (int)lastRead);
            }
            while (lastRead == pullBuffer.Length);

            var task = Dispatcher.RunAsync(
                Windows.UI.Core.CoreDispatcherPriority.Normal, () =>
            {
                mediaElement.SetSource(
                    playbackStreamWithHeader.AsRandomAccessStream(), "audio/wav");
                mediaElement.Play();
            });
        }

        private void InitializeDialogServiceConnector()
        {
            // New code will go here
        }

        private async void ListenButton_ButtonClicked(
            object sender, RoutedEventArgs e)
        {
            // New code will go here
        }
    }
}

Adicione o snippet de código a seguir ao corpo do método de InitializeDialogServiceConnector. Esse código criar o DialogServiceConnector com as informações de assinatura.

// Create a BotFrameworkConfig by providing a Speech service subscription key
// the botConfig.Language property is optional (default en-US)
const string speechSubscriptionKey = "YourSpeechSubscriptionKey"; // Your subscription key
const string region = "YourServiceRegion"; // Your subscription service region.

var botConfig = BotFrameworkConfig.FromSubscription(speechSubscriptionKey, region);
botConfig.Language = "en-US";
connector = new DialogServiceConnector(botConfig);

Observação

Confira a lista de regiões compatíveis com assistentes de voz e implante seus recursos em uma dessas regiões.

Observação

Para saber mais sobre como configurar, confira a documentação do Bot Framework para o canal de Fala do Direct Line.

Substitua as cadeias de caracteres YourSpeechSubscriptionKey e YourServiceRegion por seus próprios valores da assinatura de fala e pela região.

Acrescente o snippet de código a seguir ao final do corpo do método de InitializeDialogServiceConnector. Esse código define manipuladores de eventos utilizados por DialogServiceConnector para comunicar suas atividades de bot, resultados de reconhecimento de fala e outras informações.

// ActivityReceived is the main way your bot will communicate with the client 
// and uses bot framework activities
connector.ActivityReceived += (sender, activityReceivedEventArgs) =>
{
    NotifyUser(
        $"Activity received, hasAudio={activityReceivedEventArgs.HasAudio} activity={activityReceivedEventArgs.Activity}");

    if (activityReceivedEventArgs.HasAudio)
    {
        SynchronouslyPlayActivityAudio(activityReceivedEventArgs.Audio);
    }
};

// Canceled will be signaled when a turn is aborted or experiences an error condition
connector.Canceled += (sender, canceledEventArgs) =>
{
    NotifyUser($"Canceled, reason={canceledEventArgs.Reason}");
    if (canceledEventArgs.Reason == CancellationReason.Error)
    {
        NotifyUser(
            $"Error: code={canceledEventArgs.ErrorCode}, details={canceledEventArgs.ErrorDetails}");
    }
};

// Recognizing (not 'Recognized') will provide the intermediate recognized text 
// while an audio stream is being processed
connector.Recognizing += (sender, recognitionEventArgs) =>
{
    NotifyUser($"Recognizing! in-progress text={recognitionEventArgs.Result.Text}");
};

// Recognized (not 'Recognizing') will provide the final recognized text 
// once audio capture is completed
connector.Recognized += (sender, recognitionEventArgs) =>
{
    NotifyUser($"Final speech to text result: '{recognitionEventArgs.Result.Text}'");
};

// SessionStarted will notify when audio begins flowing to the service for a turn
connector.SessionStarted += (sender, sessionEventArgs) =>
{
    NotifyUser($"Now Listening! Session started, id={sessionEventArgs.SessionId}");
};

// SessionStopped will notify when a turn is complete and 
// it's safe to begin listening again
connector.SessionStopped += (sender, sessionEventArgs) =>
{
    NotifyUser($"Listening complete. Session ended, id={sessionEventArgs.SessionId}");
};

Adicione o snippet de código a seguir ao corpo do método ListenButton_ButtonClicked na classe MainPage. Esse código configura DialogServiceConnector para escutar, pois você já estabeleceu a configuração e registrou os manipuladores de eventos.

if (connector == null)
{
    InitializeDialogServiceConnector();
    // Optional step to speed up first interaction: if not called, 
    // connection happens automatically on first use
    var connectTask = connector.ConnectAsync();
}

try
{
    // Start sending audio to your speech-enabled bot
    var listenTask = connector.ListenOnceAsync();

    // You can also send activities to your bot as JSON strings -- 
    // Microsoft.Bot.Schema can simplify this
    string speakActivity = 
        @"{""type"":""message"",""text"":""Greeting Message"", ""speak"":""Hello there!""}";
    await connector.SendActivityAsync(speakActivity);

}
catch (Exception ex)
{
    NotifyUser($"Exception: {ex.ToString()}", NotifyType.ErrorMessage);
}

Compilar e executar o aplicativo

Agora está tudo pronto para você compilar o aplicativo e testar o assistente de voz personalizado usando o serviço de Fala.

Na barra de menus, escolha Compilar>Compilar Solução para compilar o aplicativo. Agora, o código deve compilar sem erros.
Escolha Depurar>Iniciar Depuração (ou pressione F5) para iniciar o aplicativo. A janela helloworld é exibida.
Selecione Habilitar Microfone e, quando a solicitação de permissão de acesso for exibida, selecione Sim.
Selecione Falar com o bot e fale uma frase ou uma sentença em inglês no microfone do dispositivo. Sua fala será transmitida para o canal de Fala do Direct Line e transcrita em texto, que será exibida na janela.

Próximas etapas

Explorar amostras de C# no GitHub

Veja ou baixe todas as Amostras em Java do SDK de Fala no GitHub.

Escolha o ambiente de destino

Java Runtime
Android

Pré-requisitos

Antes de começar, é preciso:

Criar um recurso de Fala
Configurar seu ambiente de desenvolvimento e criar um projeto vazio
Criar um bot conectado ao canal de Fala do Direct Line
Verificar se você tem acesso a um microfone para captura de áudio

Observação

Confira a lista de regiões compatíveis com assistentes de voz e implante seus recursos em uma dessas regiões.

Criar e configurar o projeto

Criar um projeto do Eclipse e instalar o SDK de Fala.

Além disso, para habilitar o log, atualize o arquivo pom.xml para incluir a seguinte dependência:

 <dependency>
     <groupId>org.slf4j</groupId>
     <artifactId>slf4j-simple</artifactId>
     <version>1.7.5</version>
 </dependency>

Adicionar código de exemplo

Para adicionar uma nova classe vazia ao projeto Java, selecione Arquivo>Novo>Classe.
Na janela Nova Classe Java, insira speechsdk.quickstart no campo Pacote e Principal no campo Nome.

Abra a classe Main recém-criada e substitua o conteúdo do arquivo Main.java pelo seguinte código inicial:

package speechsdk.quickstart;

import com.microsoft.cognitiveservices.speech.audio.AudioConfig;
import com.microsoft.cognitiveservices.speech.audio.PullAudioOutputStream;
import com.microsoft.cognitiveservices.speech.dialog.BotFrameworkConfig;
import com.microsoft.cognitiveservices.speech.dialog.DialogServiceConnector;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.SourceDataLine;
import java.io.InputStream;

public class Main {
    final Logger log = LoggerFactory.getLogger(Main.class);

    public static void main(String[] args) {
        // New code will go here
    }

    private void playAudioStream(PullAudioOutputStream audio) {
        ActivityAudioStream stream = new ActivityAudioStream(audio);
        final ActivityAudioStream.ActivityAudioFormat audioFormat = stream.getActivityAudioFormat();
        final AudioFormat format = new AudioFormat(
                AudioFormat.Encoding.PCM_SIGNED,
                audioFormat.getSamplesPerSecond(),
                audioFormat.getBitsPerSample(),
                audioFormat.getChannels(),
                audioFormat.getFrameSize(),
                audioFormat.getSamplesPerSecond(),
                false);
        try {
            int bufferSize = format.getFrameSize();
            final byte[] data = new byte[bufferSize];

            SourceDataLine.Info info = new DataLine.Info(SourceDataLine.class, format);
            SourceDataLine line = (SourceDataLine) AudioSystem.getLine(info);
            line.open(format);

            if (line != null) {
                line.start();
                int nBytesRead = 0;
                while (nBytesRead != -1) {
                    nBytesRead = stream.read(data);
                    if (nBytesRead != -1) {
                        line.write(data, 0, nBytesRead);
                    }
                }
                line.drain();
                line.stop();
                line.close();
            }
            stream.close();

        } catch (Exception e) {
            e.printStackTrace();
        }
    }

}

No método main, primeiro você configurará o DialogServiceConfig e o usará para criar uma instância do DialogServiceConnector. Essa instância se conecta ao canal de Fala do Direct Line para interagir com o bot. Uma instância de AudioConfig também é usada para especificar a origem da entrada de áudio. Neste exemplo, o microfone padrão é usado com AudioConfig.fromDefaultMicrophoneInput().
- Substitua a cadeia de caracteres YourSubscriptionKey pela chave de recurso do Serviço Cognitivo do Azure para Fala, que você pode obter do portal do Azure.
- Substitua a cadeia de caracteres YourServiceRegion pela região associada ao seu recurso do Serviço Cognitivo do Azure para Fala.
Observação

Confira a lista de regiões compatíveis com assistentes de voz e implante seus recursos em uma dessas regiões.
```
final String subscriptionKey = "YourSubscriptionKey"; // Your subscription key
final String region = "YourServiceRegion"; // Your speech subscription service region
final BotFrameworkConfig botConfig = BotFrameworkConfig.fromSubscription(subscriptionKey, region);

// Configure audio input from a microphone.
final AudioConfig audioConfig = AudioConfig.fromDefaultMicrophoneInput();

// Create a DialogServiceConnector instance.
final DialogServiceConnector connector = new DialogServiceConnector(botConfig, audioConfig);
```

O conector DialogServiceConnector depende de vários eventos para comunicar as atividades do bot, os resultados de reconhecimento de fala e outras informações. Adicione esses ouvintes de eventos em seguida.

// Recognizing will provide the intermediate recognized text while an audio stream is being processed.
connector.recognizing.addEventListener((o, speechRecognitionResultEventArgs) -> {
    log.info("Recognizing speech event text: {}", speechRecognitionResultEventArgs.getResult().getText());
});

// Recognized will provide the final recognized text once audio capture is completed.
connector.recognized.addEventListener((o, speechRecognitionResultEventArgs) -> {
    log.info("Recognized speech event reason text: {}", speechRecognitionResultEventArgs.getResult().getText());
});

// SessionStarted will notify when audio begins flowing to the service for a turn.
connector.sessionStarted.addEventListener((o, sessionEventArgs) -> {
    log.info("Session Started event id: {} ", sessionEventArgs.getSessionId());
});

// SessionStopped will notify when a turn is complete and it's safe to begin listening again.
connector.sessionStopped.addEventListener((o, sessionEventArgs) -> {
    log.info("Session stopped event id: {}", sessionEventArgs.getSessionId());
});

// Canceled will be signaled when a turn is aborted or experiences an error condition.
connector.canceled.addEventListener((o, canceledEventArgs) -> {
    log.info("Canceled event details: {}", canceledEventArgs.getErrorDetails());
    connector.disconnectAsync();
});

// ActivityReceived is the main way your bot will communicate with the client and uses Bot Framework activities.
connector.activityReceived.addEventListener((o, activityEventArgs) -> {
    final String act = activityEventArgs.getActivity().serialize();
        log.info("Received activity {} audio", activityEventArgs.hasAudio() ? "with" : "without");
        if (activityEventArgs.hasAudio()) {
            playAudioStream(activityEventArgs.getAudio());
        }
    });

Conecte DialogServiceConnector à Fala do Direct Line invocando o método connectAsync(). Para testar seu bot, invoque o método listenOnceAsync para enviar a entrada de áudio do microfone. Além disso, use também o método sendActivityAsync para enviar uma atividade personalizada como uma cadeia de caracteres serializada. Essas atividades personalizadas podem fornecer dados adicionais que o bot usa na conversa.
```
connector.connectAsync();
// Start listening.
System.out.println("Say something ...");
connector.listenOnceAsync();

// connector.sendActivityAsync(...)
```
Salve as alterações no arquivo Main.
Para dar suporte à reprodução de resposta, adicione uma classe extra que transforma o objeto PullAudioOutputStream retornado da API getAudio() em um InputStream Java para facilitar a manipulação. Este ActivityAudioStream é uma classe especializada que manipula a resposta de áudio do canal de Fala do Direct Line. Ele fornece acessadores para efetuar fetch das informações de formato de áudio necessárias para manipular a reprodução. Para fazer isso, selecione Arquivo>Novo>Classe.
Na janela Nova Classe Java, insira speechsdk.quickstart no campo Pacote e ActivityAudioStream no campo Nome.

Abra a classe ActivityAudioStream recém-criada e substitua-a pelo seguinte código:

package com.speechsdk.quickstart;

import com.microsoft.cognitiveservices.speech.audio.PullAudioOutputStream;

import java.io.IOException;
import java.io.InputStream;

 public final class ActivityAudioStream extends InputStream {
     /**
      * The number of samples played per second (16 kHz).
      */
     public static final long SAMPLE_RATE = 16000;
     /**
      * The number of bits in each sample of a sound that has this format (16 bits).
      */
     public static final int BITS_PER_SECOND = 16;
     /**
      * The number of audio channels in this format (1 for mono).
      */
     public static final int CHANNELS = 1;
     /**
      * The number of bytes in each frame of a sound that has this format (2).
      */
     public static final int FRAME_SIZE = 2;

     /**
      * Reads up to a specified maximum number of bytes of data from the audio
      * stream, putting them into the given byte array.
      *
      * @param b   the buffer into which the data is read
      * @param off the offset, from the beginning of array <code>b</code>, at which
      *            the data will be written
      * @param len the maximum number of bytes to read
      * @return the total number of bytes read into the buffer, or -1 if there
      * is no more data because the end of the stream has been reached
      */
     @Override
     public int read(byte[] b, int off, int len) {
         byte[] tempBuffer = new byte[len];
         int n = (int) this.pullStreamImpl.read(tempBuffer);
         for (int i = 0; i < n; i++) {
             if (off + i > b.length) {
                 throw new ArrayIndexOutOfBoundsException(b.length);
             }
             b[off + i] = tempBuffer[i];
         }
         if (n == 0) {
             return -1;
         }
         return n;
     }

     /**
      * Reads the next byte of data from the activity audio stream if available.
      *
      * @return the next byte of data, or -1 if the end of the stream is reached
      * @see #read(byte[], int, int)
      * @see #read(byte[])
      * @see #available
      * <p>
      */
     @Override
     public int read() {
         byte[] data = new byte[1];
         int temp = read(data);
         if (temp <= 0) {
             // we have a weird situation if read(byte[]) returns 0!
             return -1;
         }
         return data[0] & 0xFF;
     }

     /**
      * Reads up to a specified maximum number of bytes of data from the activity audio stream,
      * putting them into the given byte array.
      *
      * @param b the buffer into which the data is read
      * @return the total number of bytes read into the buffer, or -1 if there
      * is no more data because the end of the stream has been reached
      */
     @Override
     public int read(byte[] b) {
         int n = (int) pullStreamImpl.read(b);
         if (n == 0) {
             return -1;
         }
         return n;
     }

     /**
      * Skips over and discards a specified number of bytes from this
      * audio input stream.
      *
      * @param n the requested number of bytes to be skipped
      * @return the actual number of bytes skipped
      * @throws IOException if an input or output error occurs
      * @see #read
      * @see #available
      */
     @Override
     public long skip(long n) {
         if (n <= 0) {
             return 0;
         }
         if (n <= Integer.MAX_VALUE) {
             byte[] tempBuffer = new byte[(int) n];
             return read(tempBuffer);
         }
         long count = 0;
         for (long i = n; i > 0; i -= Integer.MAX_VALUE) {
             int size = (int) Math.min(Integer.MAX_VALUE, i);
             byte[] tempBuffer = new byte[size];
             count += read(tempBuffer);
         }
         return count;
     }

     /**
      * Closes this audio input stream and releases any system resources associated
      * with the stream.
      */
     @Override
     public void close() {
         this.pullStreamImpl.close();
     }

     /**
      * Fetch the audio format for the ActivityAudioStream. The ActivityAudioFormat defines the sample rate, bits per sample, and the # channels.
      *
      * @return instance of the ActivityAudioFormat associated with the stream
      */
     public ActivityAudioStream.ActivityAudioFormat getActivityAudioFormat() {
         return activityAudioFormat;
     }

     /**
      * Returns the maximum number of bytes that can be read (or skipped over) from this
      * audio input stream without blocking.
      *
      * @return the number of bytes that can be read from this audio input stream without blocking.
      * As this implementation does not buffer, this will be defaulted to 0
      */
     @Override
     public int available() {
         return 0;
     }

     public ActivityAudioStream(final PullAudioOutputStream stream) {
         pullStreamImpl = stream;
         this.activityAudioFormat = new ActivityAudioStream.ActivityAudioFormat(SAMPLE_RATE, BITS_PER_SECOND, CHANNELS, FRAME_SIZE, AudioEncoding.PCM_SIGNED);
     }

     private PullAudioOutputStream pullStreamImpl;

     private ActivityAudioFormat activityAudioFormat;

     /**
      * ActivityAudioFormat is an internal format which contains metadata regarding the type of arrangement of
      * audio bits in this activity audio stream.
      */
     static class ActivityAudioFormat {

         private long samplesPerSecond;
         private int bitsPerSample;
         private int channels;
         private int frameSize;
         private AudioEncoding encoding;

         public ActivityAudioFormat(long samplesPerSecond, int bitsPerSample, int channels, int frameSize, AudioEncoding encoding) {
             this.samplesPerSecond = samplesPerSecond;
             this.bitsPerSample = bitsPerSample;
             this.channels = channels;
             this.encoding = encoding;
             this.frameSize = frameSize;
         }

         /**
          * Fetch the number of samples played per second for the associated audio stream format.
          *
          * @return the number of samples played per second
          */
         public long getSamplesPerSecond() {
             return samplesPerSecond;
         }

         /**
          * Fetch the number of bits in each sample of a sound that has this audio stream format.
          *
          * @return the number of bits per sample
          */
         public int getBitsPerSample() {
             return bitsPerSample;
         }

         /**
          * Fetch the number of audio channels used by this audio stream format.
          *
          * @return the number of channels
          */
         public int getChannels() {
             return channels;
         }

         /**
          * Fetch the default number of bytes in a frame required by this audio stream format.
          *
          * @return the number of bytes
          */
         public int getFrameSize() {
             return frameSize;
         }

         /**
          * Fetch the audio encoding type associated with this audio stream format.
          *
          * @return the encoding associated
          */
         public AudioEncoding getEncoding() {
             return encoding;
         }
     }

     /**
      * Enum defining the types of audio encoding supported by this stream.
      */
     public enum AudioEncoding {
         PCM_SIGNED("PCM_SIGNED");

         String value;

         AudioEncoding(String value) {
             this.value = value;
         }
     }
 }

Salve as alterações no arquivo ActivityAudioStream.

Compilar e executar o aplicativo

Selecione F11 ou Executar>Depurar. O console exibe a mensagem "Diga algo". Nesse momento, fale uma frase ou uma sentença em inglês que o bot possa entender. Sua fala é transmitida para o bot por meio do canal de Fala do Direct Line, no qual é reconhecido e processado pelo bot. A resposta é retornada como uma atividade. Se o bot retornar uma fala como resposta, o áudio será reproduzido usando a classe AudioPlayer.

Captura de tela da saída do console após o reconhecimento com êxito

Próximas etapas

Explorar amostras de Java no GitHub

Pré-requisitos

Antes de começar, é preciso:

Criar um recurso de Fala
Configurar seu ambiente de desenvolvimento e criar um projeto vazio
Criar um bot conectado ao canal de Fala do Direct Line
Verificar se você tem acesso a um microfone para captura de áudio

Observação

Confira a lista de regiões compatíveis com assistentes de voz e implante seus recursos em uma dessas regiões.

Criar e configurar um projeto

Instalar o SDK de Fala usando o Android Studio.

Criar interface do usuário

Nesta seção, criaremos uma IU (interface do usuário) básica para o aplicativo. Vamos começar abrindo a atividade principal: activity_main.xml. O modelo básico inclui uma barra de título com o nome do aplicativo e uma TextView com a mensagem “Olá, Mundo!”.

Agora, substitua o conteúdo de activity_main.xml pelo seguinte código:

<?xml version="1.0" encoding="utf-8"?>
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
 xmlns:tools="http://schemas.android.com/tools"
 android:layout_width="match_parent"
 android:layout_height="match_parent"
 android:orientation="vertical"
 tools:context=".MainActivity">

 <Button
     android:id="@+id/button"
     android:layout_width="wrap_content"
     android:layout_height="wrap_content"
     android:layout_gravity="center"
     android:onClick="onBotButtonClicked"
     android:text="Talk to your bot" />

 <TextView
     android:layout_width="match_parent"
     android:layout_height="wrap_content"
     android:text="Recognition Data"
     android:textSize="18dp"
     android:textStyle="bold" />

 <TextView
     android:id="@+id/recoText"
     android:layout_width="match_parent"
     android:layout_height="wrap_content"
     android:text="  \n(Recognition goes here)\n" />

 <TextView
     android:layout_width="match_parent"
     android:layout_height="wrap_content"
     android:text="Activity Data"
     android:textSize="18dp"
     android:textStyle="bold" />

 <TextView
     android:id="@+id/activityText"
     android:layout_width="match_parent"
     android:layout_height="match_parent"
     android:scrollbars="vertical"
     android:text="  \n(Activities go here)\n" />

</LinearLayout>

Esse XML define uma interface do usuário simples para interação com seu bot.

O elemento button inicia uma interação e chama o método onBotButtonClicked quando clicado.
O elemento recoText exibe os resultados de conversão de fala em texto enquanto você fala com o bot.
O elemento activityText exibe o conteúdo JSON da atividade Bot Framework mais recente de seu bot.

O texto e a representação gráfica da sua interface do usuário agora devem ser semelhantes a isto:

Captura de tela de como deve ser a aparência da interface do usuário de Fale com seu bot.

Adicionar código de exemplo

Abra MainActivity.java e substitua o conteúdo pelo código a seguir:

 package samples.speech.cognitiveservices.microsoft.com;

 import android.media.AudioFormat;
 import android.media.AudioManager;
 import android.media.AudioTrack;
 import android.support.v4.app.ActivityCompat;
 import android.support.v7.app.AppCompatActivity;
 import android.os.Bundle;
 import android.text.method.ScrollingMovementMethod;
 import android.view.View;
 import android.widget.TextView;

 import com.microsoft.cognitiveservices.speech.audio.AudioConfig;
 import com.microsoft.cognitiveservices.speech.audio.PullAudioOutputStream;
 import com.microsoft.cognitiveservices.speech.dialog.BotFrameworkConfig;
 import com.microsoft.cognitiveservices.speech.dialog.DialogServiceConnector;

 import org.json.JSONException;
 import org.json.JSONObject;

 import static android.Manifest.permission.*;

 public class MainActivity extends AppCompatActivity {
     // Replace below with your own speech subscription key
     private static String speechSubscriptionKey = "YourSpeechSubscriptionKey";
     // Replace below with your own speech service region
     private static String serviceRegion = "YourSpeechServiceRegion";

     private DialogServiceConnector connector;

     @Override
     protected void onCreate(Bundle savedInstanceState) {
         super.onCreate(savedInstanceState);
         setContentView(R.layout.activity_main);

         TextView recoText = (TextView) this.findViewById(R.id.recoText);
         TextView activityText = (TextView) this.findViewById(R.id.activityText);
         recoText.setMovementMethod(new ScrollingMovementMethod());
         activityText.setMovementMethod(new ScrollingMovementMethod());

         // Note: we need to request permissions for audio input and network access
         int requestCode = 5; // unique code for the permission request
         ActivityCompat.requestPermissions(MainActivity.this, new String[]{RECORD_AUDIO, INTERNET}, requestCode);
     }

     public void onBotButtonClicked(View v) {
         // Recreate the DialogServiceConnector on each button press, ensuring that the existing one is closed
         if (connector != null) {
             connector.close();
             connector = null;
         }

         // Create the DialogServiceConnector from speech subscription information
         BotFrameworkConfig config = BotFrameworkConfig.fromSubscription(speechSubscriptionKey, serviceRegion);
         connector = new DialogServiceConnector(config, AudioConfig.fromDefaultMicrophoneInput());

         // Optional step: preemptively connect to reduce first interaction latency
         connector.connectAsync();

         // Register the DialogServiceConnector's event listeners
         registerEventListeners();

         // Begin sending audio to your bot
         connector.listenOnceAsync();
     }

     private void registerEventListeners() {
         TextView recoText = (TextView) this.findViewById(R.id.recoText); // 'recoText' is the ID of your text view
         TextView activityText = (TextView) this.findViewById(R.id.activityText); // 'activityText' is the ID of your text view

         // Recognizing will provide the intermediate recognized text while an audio stream is being processed
         connector.recognizing.addEventListener((o, recoArgs) -> {
             recoText.setText("  Recognizing: " + recoArgs.getResult().getText());
         });

         // Recognized will provide the final recognized text once audio capture is completed
         connector.recognized.addEventListener((o, recoArgs) -> {
             recoText.setText("  Recognized: " + recoArgs.getResult().getText());
         });

         // SessionStarted will notify when audio begins flowing to the service for a turn
         connector.sessionStarted.addEventListener((o, sessionArgs) -> {
             recoText.setText("Listening...");
         });

         // SessionStopped will notify when a turn is complete and it's safe to begin listening again
         connector.sessionStopped.addEventListener((o, sessionArgs) -> {
         });

         // Canceled will be signaled when a turn is aborted or experiences an error condition
         connector.canceled.addEventListener((o, canceledArgs) -> {
             recoText.setText("Canceled (" + canceledArgs.getReason().toString() + ") error details: {}" + canceledArgs.getErrorDetails());
             connector.disconnectAsync();
         });

         // ActivityReceived is the main way your bot will communicate with the client and uses bot framework activities.
         connector.activityReceived.addEventListener((o, activityArgs) -> {
             try {
                 // Here we use JSONObject only to "pretty print" the condensed Activity JSON
                 String rawActivity = activityArgs.getActivity().serialize();
                 String formattedActivity = new JSONObject(rawActivity).toString(2);
                 activityText.setText(formattedActivity);
             } catch (JSONException e) {
                 activityText.setText("Couldn't format activity text: " + e.getMessage());
             }

             if (activityArgs.hasAudio()) {
                 // Text to speech audio associated with the activity is 16 kHz 16-bit mono PCM data
                 final int sampleRate = 16000;
                 int bufferSize = AudioTrack.getMinBufferSize(sampleRate, AudioFormat.CHANNEL_OUT_MONO, AudioFormat.ENCODING_PCM_16BIT);

                 AudioTrack track = new AudioTrack(
                         AudioManager.STREAM_MUSIC,
                         sampleRate,
                         AudioFormat.CHANNEL_OUT_MONO,
                         AudioFormat.ENCODING_PCM_16BIT,
                         bufferSize,
                         AudioTrack.MODE_STREAM);

                 track.play();

                 PullAudioOutputStream stream = activityArgs.getAudio();

                 // Audio is streamed as it becomes available. Play it as it arrives.
                 byte[] buffer = new byte[bufferSize];
                 long bytesRead = 0;

                 do {
                     bytesRead = stream.read(buffer);
                     track.write(buffer, 0, (int) bytesRead);
                 } while (bytesRead == bufferSize);

                 track.release();
             }
         });
     }
 }

O método onCreate inclui código que exige permissões de microfone e Internet.
O método onBotButtonClicked é, conforme observado anteriormente, o manipulador de clique do botão. O pressionamento de um botão dispara uma interação única (“turno”) com o bot.
O método registerEventListeners demonstra os eventos usados pelo DialogServiceConnector e o tratamento básico das atividades de entrada.

No mesmo arquivo, substitua as cadeia de caracteres de configuração para corresponder aos seus recursos:
- Substitua YourSpeechSubscriptionKey por sua chave de assinatura.
- Substitua YourServiceRegion pela região associada à assinatura. Atualmente, somente um subconjunto de regiões do Serviço de Fala são compatíveis atualmente com a Direct Line. Para saber mais, confira regiões.

Compilar e executar o aplicativo

Conecte o dispositivo Android ao PC de desenvolvimento. Verifique se você habilitou o modo de desenvolvimento e depuração de USB no dispositivo.
Para compilar o aplicativo, pressione Ctrl+F9 ou escolha Construir>Criar um Projeto na barra de menus.
Para inicializar o aplicativo, pressione Shift+F10 ou escolha Executar>Executar 'aplicativo' .
Na janela de destino da implantação exibida, escolha seu dispositivo Android.

Depois que o aplicativo e sua atividade forem iniciados, clique no botão para começar a falar com o bot. O texto transcrito aparecerá enquanto você fala e a última atividade recebida do bot aparecerá quando for recebida. Se seu bot estiver configurado para dar respostas faladas, a conversão de fala em texto será reproduzida automaticamente.

Captura de tela do aplicativo Android

Próximas etapas

Explorar amostras de Java no GitHub

Veja ou baixe todas as Amostras em Go do SDK de Fala no GitHub.

Pré-requisitos

Antes de começar:

Criar um recurso de Fala
Configurar seu ambiente de desenvolvimento e criar um projeto vazio
Criar um bot conectado ao canal de Fala do Direct Line
Verificar se você tem acesso a um microfone para captura de áudio

Observação

Confira a lista de regiões compatíveis com assistentes de voz e implante seus recursos em uma dessas regiões.

Configurar seu ambiente

Atualize o arquivo go.mod com a versão mais recente do SDK adicionando esta linha

require (
    github.com/Microsoft/cognitive-services-speech-sdk-go v1.15.0
)

Comece com código de texto clichê

Substitua o conteúdo do arquivo de origem (por exemplo, quickstart.go) pelo seguinte, que inclui:

definição do pacote "principal"
importar os módulos necessários do SDK de Fala
variáveis para armazenar as informações de bot que serão substituídas posteriormente neste início rápido
uma implementação simples usando o microfone para entrada de áudio
manipuladores de eventos para vários eventos que ocorrem durante a interação de fala

package main

import (
    "fmt"
    "time"

    "github.com/Microsoft/cognitive-services-speech-sdk-go/audio"
    "github.com/Microsoft/cognitive-services-speech-sdk-go/dialog"
    "github.com/Microsoft/cognitive-services-speech-sdk-go/speech"
)

func main() {
    subscription :=  "YOUR_SUBSCRIPTION_KEY"
    region := "YOUR_BOT_REGION"

    audioConfig, err := audio.NewAudioConfigFromDefaultMicrophoneInput()
    if err != nil {
        fmt.Println("Got an error: ", err)
        return
    }
    defer audioConfig.Close()
    config, err := dialog.NewBotFrameworkConfigFromSubscription(subscription, region)
    if err != nil {
        fmt.Println("Got an error: ", err)
        return
    }
    defer config.Close()
    connector, err := dialog.NewDialogServiceConnectorFromConfig(config, audioConfig)
    if err != nil {
        fmt.Println("Got an error: ", err)
        return
    }
    defer connector.Close()
    activityReceivedHandler := func(event dialog.ActivityReceivedEventArgs) {
        defer event.Close()
        fmt.Println("Received an activity.")
    }
    connector.ActivityReceived(activityReceivedHandler)
    recognizedHandle := func(event speech.SpeechRecognitionEventArgs) {
        defer event.Close()
        fmt.Println("Recognized ", event.Result.Text)
    }
    connector.Recognized(recognizedHandle)
    recognizingHandler := func(event speech.SpeechRecognitionEventArgs) {
        defer event.Close()
        fmt.Println("Recognizing ", event.Result.Text)
    }
    connector.Recognizing(recognizingHandler)
    connector.ListenOnceAsync()
    <-time.After(10 * time.Second)
}

Substitua os valores YOUR_SUBSCRIPTION_KEY e YOUR_BOT_REGION pelos valores reais do recurso de Fala.

Navegue até o portal do Azure e abra o recurso de Fala
Em Chaves e Ponto de Extremidade à esquerda, há duas chaves de assinatura disponíveis
- Use qualquer uma das duas como a substituição do valor YOUR_SUBSCRIPTION_KEY
Na Visão Geral à esquerda, observe a região e mapeie-a para o identificador da região
- Use o Identificador da região como a substituição do valor YOUR_BOT_REGION, por exemplo: "westus" para Oeste dos EUA
Observação

Confira a lista de regiões compatíveis com assistentes de voz e implante seus recursos em uma dessas regiões.

Observação

Para saber mais sobre como configurar, confira a documentação do Bot Framework para o canal de Fala do Direct Line.

Explicação de código

A região e a chave de assinatura da Fala são necessárias para criar um objeto de configuração de fala. O objeto de configuração é necessário para criar uma instância de um objeto de reconhecedor de fala.

A instância do reconhecedor expõe várias maneiras de reconhecer a fala. Neste exemplo, a fala é reconhecida continuamente. Essa funcionalidade informa ao serviço de Fala que você está enviando várias frases para reconhecimento e quando o programa é encerrado para interromper o reconhecimento de fala. Conforme os resultados são gerados, o código os grava no console.

Criar e executar

Agora está tudo pronto para você compilar o projeto e testar o assistente de voz personalizado usando o serviço de Fala.

Compile seu projeto, por exemplo, "go build"
Execute o módulo e fale uma frase ou uma sentença no microfone do dispositivo. Sua fala será transmitida para o canal de Fala do Direct Line e transcrita em texto, que aparecerá como a saída.

Observação

O SDK de Fala usará como padrão o reconhecimento do uso de en-us como idioma; confira Como reconhecer a fala para obter informações sobre como escolher o idioma de origem.

Próximas etapas

Explorar amostras de Go no GitHub

Compatibilidade com outros idiomas e plataformas

Se você clicou nessa guia, provavelmente não viu um início rápido em sua linguagem de programação favorita. Não se preocupe, temos outros materiais de início rápido e exemplos de código disponíveis no GitHub. Use a tabela para encontrar o exemplo correto para a sua linguagem de programação e combinação de plataforma/sistema operacional.

Linguagem	Exemplos de código
C#	.NET Framework, .NET Core, UWP, Unity, Xamarin
C++	Windows, Linux, macOS
Java	Android, JRE
JavaScript	Browser, Node.js
Objective-C	iOS, macOS
Python	Windows, Linux, macOS
Swift	iOS, macOS

Share via

Início Rápido: Assistente Criar uma voz personalizada

Pré-requisitos

Abra o projeto no Visual Studio

Comece com código de texto clichê

Compilar e executar o aplicativo

Próximas etapas

Pré-requisitos

Criar e configurar o projeto

Adicionar código de exemplo

Compilar e executar o aplicativo

Próximas etapas

Pré-requisitos

Criar e configurar um projeto

Criar interface do usuário

Adicionar código de exemplo

Compilar e executar o aplicativo

Próximas etapas

Pré-requisitos

Configurar seu ambiente

Comece com código de texto clichê

Explicação de código

Criar e executar

Próximas etapas

Compatibilidade com outros idiomas e plataformas

Recursos adicionais