Share via

call centre analytics

Mahesh Ch 6 Reputation points
2021-02-22T07:41:20.423+00:00

I was looking at call centre analytics solution implemented by Microsoft. speech to text and text analytics giving me agent agent in the end converted file. How do i get agent and customer conversations separated. someone please guide me here.

Azure Speech in Foundry Tools
Azure Document Intelligence in Foundry Tools
0 comments No comments

2 answers

Sort by: Most helpful
  1. Saddaf Khan 6 Reputation points
    2021-12-03T07:43:57.683+00:00

    I have 10 years of experience in call center jobs. Do you want any help about any information about call center, then tell me without any hesitation. Actually, working in a call center is like any other service or sales job. The main difference is that all customer interactions happen over the phone, either by dialing or answering calls every day. Call center life is hard work, but the hardest things in life are often the most rewarding.

    Was this answer helpful?

    1 person found this answer helpful.
    0 comments No comments

  2. Ramr-msft 17,836 Reputation points
    2021-02-22T12:04:56.857+00:00

    @Mahesh Ch Thanks for the question. Can you please share the sample code that you are trying.
    Azure provides Speaker identification within Speech Services, but in a call center scenario the customer does not need to identify who is speaking, and cannot train the model beforehand with speaker voices since a new user calls in every time. Rather they only need to identify different voices when converting voice to text.

    Microsoft Cognitive Services Batch transcription API have ability to identify the 2 voices separately (eg. Speaker 0 - Agent or Speaker 1 - Customer when there are 2 speakers) when converting speech to text.

    Here is a blog documenting it with sample code trying to do this with Speech + Text Analytics: https://azure.microsoft.com/en-us/blog/using-text-analytics-in-call-centers/

    Video Indexer support transcription, speaker diarization (enumeration), and emotion recognition both from the text and the tone of the voice. Additional insights are available as well e.g. topic inference, language identification, brand detection, translation, etc. You can consume it via the video or audio-only APIs for COGS optimization.

    Was this answer helpful?


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.