Automatic Playback of Audio Stream in Browser with Text-to-Speech Streaming

Huang Diamond 5 Reputation points
2023-12-26T10:47:36.9+00:00

Hello,

I'm currently working on developing a customer service chatbot using Azure's speech service. My project involves using React.js for the client side and FastAPI for the server side. While the setup generally works well, I've encountered an unusual issue: the browser automatically plays the audio stream received from the server, even without an <audio> HTML tag present in my application. Moreover, my code for handling audio streaming isn't functioning as expected.

Here's the relevant React code for reference:

const SpeechRecongnition = ({
  setMessages,
  getHistory,
}: SpeechRecorderProps) => {
  const {
    startRecording,
    stopRecording,
    audioBlob,
    isRecording,
    clearAudioBlob,
  } = useAudioRecorder();
  const audioRef = useRef<HTMLAudioElement>(null);
  const [audioUrl, setAudioUrl] = useState<string | undefined>(undefined);

  const toggleRecording = () => {
    if (isRecording) {
      stopRecording();
    } else {
      startRecording();
    }
  };
  /***
   * todo: solve the error:"Uncaught (in promise) DOMException: Failed to load because no supported source was found."
   *use the stream to send the speech back
   */
  const sendDataToServer = async (audioBlob: Blob | null) => {
    if (audioBlob) {
      // Create a FormData object
      const formData = new FormData();
      // Append the audio data to the FormData object
      formData.append('data', audioBlob, 'recording.wav');
      const response = await sendSpeechToTextRequest(
        formData,
        `${process.env.REACT_APP_API_BASE_URL}/chat-speech-to-text/`
      );
      const responseAudio = await sendTextToSpeechRequest(
        response,
        `${process.env.REACT_APP_API_BASE_URL}/chat-text-to-speech/`
      );
      setAudioUrl(responseAudio.audio); // Set the audio URL
      await getHistory(setMessages);
    } else {
      console.log('no audio chunks or audio is not recorded correctly!');
    }
  };

  //play the audio mediaBlobUrl is not null
  useEffect(() => {
    if (audioUrl && audioRef.current) {
      const audio = audioRef.current;
      audio.src = audioUrl;

      const audioPromise = audio.play();
      if (audioPromise !== undefined) {
        audioPromise
          .then(() => {
            console.log('play success');
          })
          .catch((error) => {
            console.log('play error', error);
            console.log(`Failed audio URL: ${audioUrl}`); // Log the problematic URL
          });
      }

      audio.onended = () => {
        setAudioUrl(undefined); // Clear the audioUrl state
        audio.src = ''; // Clear the audio element's src
        clearAudioBlob(); // Clear the audio blob
      };
    }
  }, [audioUrl]);
  //call the sendDataToServer function when mediaBlobUrl is not null
  useEffect(() => {
    if (audioBlob) {
      sendDataToServer(audioBlob);
    }
  }, [audioBlob]);

  return (
    <div>
      <button onClick={toggleRecording}>
        {isRecording ? 'Stop' : 'Start'}
      </button>
      <audio ref={audioRef} controls hidden />
    </div>
  );
};

export default SpeechRecongnition;

This is how I handle the request in React:

//
const
  userInput
  endpoint
):
  try 
    
      message
    

And here's the FastAPI handling code:

#
@
async def 
    #
    response = response_from_LLM(data.message)

if
#
audio_stream = text_to_speech(response["response"])
return
else
return
else
return

In the browser console, I've encountered the following error messages:

"play error DOMException: Failed to load because no supported source was found."

"Failed audio URL: blob:http://localhost:5173/2e0d0404-1cbf-4c88-9a50-d37dc0141ae1"

From these messages, it's clear that the <audio> tag API isn't functioning properly. However, the audio clip still plays in the browser. This occurs even after I removed the <audio> tag from my application and modified the request handler function return an empty string:

  return 
    
      
        {isRecording ? 'Stop' : 'Start'}
      </butto
    
  

I've confirmed that the issue isn't with the FastAPI server, as returning None in the route stops the audio playback.

Although the application seems to be working fine otherwise, I'm curious to understand why this automatic playback is happening. I would greatly appreciate any insights or suggestions on how to prevent the browser from automatically playing the audio stream and instead allow the <audio> tag to control playback.

Thank you in advance for your help!

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,944 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Ramr-msft 17,821 Reputation points
    2023-12-26T16:39:22.08+00:00

    Thanks for the question, Azure Speech Service Audio Streaming Issue: The Azure Speech Service supports streaming audio input, but there are some requirements and limitations that you need to be aware of. Make sure that your audio data meets these requirements, and that you’re correctly implementing the Azure Speech SDK’s audio input stream functionality.

    To prevent automatic playback and allow the <audio> tag to control playback, you can try the following steps:

    1. Remove the autoplay attribute: If the autoplay attribute is present in your <audio> tag, you can try removing it to prevent automatic playback.
    2. Use the controls attribute: The controls attribute can be used to provide the user with controls to manage audio playback, including play, pause, seeking, and volume.

    Here are samples for the same.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.