Installing plugins on Azure Synapse Workers

Iwan 16 Reputation points
2022-10-27T10:49:19.443+00:00

I am trying to run the Cognitive Services Speech-to-Text sample on Python. I want to do this on Azure Synapse Analytics. LINK: samples-python

The following code is the example I want to run from the above link.

# Create a dataframe with our audio URLs, tied to the column called "url"  
df = spark.createDataFrame([("https://mmlspark.blob.core.windows.net/datasets/Speech/audio2.wav",),  
                           ("https://mmlspark.blob.core.windows.net/datasets/Speech/audio3.mp3",)  
                           ], ["url"])  
  
# Run the Speech-to-text service to translate the audio into text  
speech_to_text = (SpeechToTextSDK()  
    .setSubscriptionKey(service_key)  
    .setLocation("eastus")  
    .setOutputCol("text")  
    .setAudioDataCol("url")  
    .setLanguage("en-US")  
    .setProfanity("Masked"))  
  
# Show the results of the translation  
display(speech_to_text.transform(df).select("url", "text.DisplayText"))  

Referring to this LINK: feedback I had on Microsoft Cognitive Services sample documentation, I need to install a GStreamer plugin on Azure Synapse in order to transform mp3 files into text.

The author has provided a bash sudo (?) script to install the plugin.

#!/bin/bash  
sudo add-apt-repository ppa:jonathonf/ffmpeg-4 -y  
sudo apt-get update  
sudo apt-get install ffmpeg -y  
sudo apt-get install libgstreamer1.0-0 gstreamer1.0-plugins-base gstreamer1.0-plugins-good -y  

How can I run this script on Azure Synapse Spark cluster to install the plugin?

In the thread, another solution might be to run an .msi file to install the plugin onto Azure Synapse but this seems to be the solution on local computers running Python.

Any idea how to install plugins onto Azure Synapse so I can run the Cognitive Services speech-to-text sample?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,373 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.