Hello Shubhanshi Gangil,
Welcome to the Microsoft Q&A and thank you for posting your questions here.
I understand that your project is in need of Speech SDK for Python to support speaker recognition.
Speaker voice identification is not directly supported, additional tools or services are necessary. You can switch your focus to speaker voice embeddings, which are vector representations of a speaker’s voice, can be extracted using models like VoxCeleb or DeepSBD and compared to classify speakers. Also, Python Libraries such as pyAudioAnalysis and librosa can assist with audio analysis tasks and may help implement speaker differentiation with the right algorithms. However, you need to understand that achieving real-time speaker recognition using just the Speech SDK for Python alone is not straightforward without integrating additional tools or services.
If you will be interested to review the links below about Azure AI, Azure Speach SDK, using Python, with the option above in step-by-step resources for Speaker Voice Identification:
- https://www.robots.ox.ac.uk/~vgg/data/voxceleb
- https://github.com/soheilkhorram/DeepSBD
- https://github.com/tyiannak/pyAudioAnalysis/
- https://librosa.org/doc/latest/index.html
- https://github.com/tyiannak/pyAudioAnalysis/wiki/Speaker-Diarization
- https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/quickstarts/python
I hope this is helpful! Do not hesitate to let me know if you have any other questions.
Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful