Installing plugins on Azure Synapse Workers
I am trying to run the Cognitive Services Speech-to-Text sample on Python. I want to do this on Azure Synapse Analytics. LINK: samples-python
The following code is the example I want to run from the above link.
# Create a dataframe with our audio URLs, tied to the column called "url"
df = spark.createDataFrame([("https://mmlspark.blob.core.windows.net/datasets/Speech/audio2.wav",),
("https://mmlspark.blob.core.windows.net/datasets/Speech/audio3.mp3",)
], ["url"])
# Run the Speech-to-text service to translate the audio into text
speech_to_text = (SpeechToTextSDK()
.setSubscriptionKey(service_key)
.setLocation("eastus")
.setOutputCol("text")
.setAudioDataCol("url")
.setLanguage("en-US")
.setProfanity("Masked"))
# Show the results of the translation
display(speech_to_text.transform(df).select("url", "text.DisplayText"))
Referring to this LINK: feedback I had on Microsoft Cognitive Services sample documentation, I need to install a GStreamer plugin on Azure Synapse in order to transform mp3 files into text.
The author has provided a bash sudo (?) script to install the plugin.
#!/bin/bash
sudo add-apt-repository ppa:jonathonf/ffmpeg-4 -y
sudo apt-get update
sudo apt-get install ffmpeg -y
sudo apt-get install libgstreamer1.0-0 gstreamer1.0-plugins-base gstreamer1.0-plugins-good -y
How can I run this script on Azure Synapse Spark cluster to install the plugin?
In the thread, another solution might be to run an .msi file to install the plugin onto Azure Synapse but this seems to be the solution on local computers running Python.
Any idea how to install plugins onto Azure Synapse so I can run the Cognitive Services speech-to-text sample?