Speech to Text Container(latest Version, buildnumber": "20220822.1" ) Cannot Work

XS 1 Reputation point
2022-09-02T09:33:49.403+00:00

I pulled the latest Speech to Text Container ,images info:
mcr.microsoft.com/azure-cognitive-services/speechservices/speech-to-text latest 76ed654a9b87 10 days ago 16.3GB

And run it with command, It start up with no error msg.

docker run --rm -it -p 5000:5000 --name speech-to-text \
mcr.microsoft.com/azure-cognitive-services/speechservices/speech-to-text \
Eula=accept \
Billing=https://eastasia.api.cognitive.microsoft.com/sts/v1.0/issuetoken \
ApiKey=xxxx(My key)

But when I test with voice file and from mic, the container just swipe with these infos(no errors)

2022-09-02 09:12:46.421696 084d57bd61f2499a9e1fda1cc675aa08 175 493 info HandleConnection: SpeechConfig(ch='')
000492: [2022-09-02 09:12:48.893] [mts] [warning] [audio-resampler] - convert_taps_gint16_c() @ audio-resampler.c:274 - can't find exact taps
2022-09-02 09:12:48.897054 f64041fc413c4c96848105fde0b521b4 175 362 info GrpcRecognizerConnection: connection peer ipv4:127.0.0.1:49248 client_cv f64041fc413c4c96848105fde0b521b4
2022-09-02 09:12:48.897178 f64041fc413c4c96848105fde0b521b4 175 362 info HandleConnection: SpeechConfig(ch='')
2022-09-02 09:12:48.951063 f64041fc413c4c96848105fde0b521b4 175 493 info GrpcRecognizerConnection: connection peer ipv4:127.0.0.1:49248 client_cv f64041fc413c4c96848105fde0b521b4
2022-09-02 09:12:48.951291 f64041fc413c4c96848105fde0b521b4 175 493 info HandleConnection: SpeechConfig(ch='')

The java client code sdk used method just like this:
WavStream wavStream = new WavStream(Files.newInputStream(Paths.get(voiceFilePath)));
speechConfig.requestWordLevelTimestamps();
// set mode INTERACTIVE, CONVERSATION, DICTATION
speechConfig.setProperty(PropertyId.SpeechServiceConnection_RecoMode, "CONVERSATION");
speechConfig.setOutputFormat(OutputFormat.Detailed);
speechConfig.setSpeechRecognitionLanguage("zh-CN");

And then client(use 1.23.0 sdk ) received the error msg after about 50s

Connection was closed by the remote host. Error code: 1011. Error details: Internal server error. SessionId:

The strange thing ,I used the old container(buildnumber": 20211103.4) .Every thing is Ok.
So Why? I confused。
Is the container images should use special local tag?
3. 5.0-amd64-zh-cn ?
https://learn.microsoft.com/en-us/azure/cognitive-services/containers/container-image-tags?tabs=current#speech-to-text

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,408 questions
Azure Container Instances
Azure Container Instances
An Azure service that provides customers with a serverless container experience.
645 questions
{count} votes

1 answer

Sort by: Most helpful
  1. XS 1 Reputation point
    2022-09-02T10:01:19.99+00:00

    The Debug msg I see may helpful is these:

    info: SpeechToText[0]
    DecoderStop correlationId='46d4cd9b-8208-49e1-b22b-071c098a4a70' connectionId='e3803d4a-fabb-4e5e-97a8-b98ffab8900e' clientConnectionId='f5c4ec51-e480-400f-80b0-8b37481b1f00' impressionId='c3c83ad3-a5f3-4b60-9d0b-ee4a4a2f3f6c' turnId='a5da1392-a4ec-456e-9bc8-afc92af59c90' clientId='00000000-0000-0000-0000-000000000000' trafficType='Untagged' locale='zh-CN' endpoint='/speech/recognition/conversation/cognitiveservices/v1' Event_Time_Utc='2022-09-02T09:56:51.2364809Z' activityId='' relatedActivityId='' decoderInstanceId='ab1ec349-5457-4dda-95b8-11516ddfde11' recognitionType='Conversation' options='{"Locale":"zh-CN","ScenarioType":"Conversation","GrammarScenario":null,"InitialSilenceTimeout":null,"TrailingSilenceTimeout":null,"MaxNBestCount":5,"BondedAudioFormat":{},"KeywordSilenceTimeout":null,"IsEprEnabled":false,"SegmentationMode":"Normal","InterimResults":true,"SegmentationSilenceTimeout":null}' audioStartTime='2022-09-02T09:56:41.1383140Z' audioEndTime='2022-09-02T09:56:41.4673758Z' audioSizeBytes='896334' audioDuration='28010' speechDuration='' firstIntermediatePhraseTime='' firstFinalPhraseTime='' firstIntermediatePhraseStartOfSpeechLatency='' firstIntermediatePhraseRecognitionLatency='' firstFinalPhraseRecognitionLatency='' firstFinalPhraseRecognitionStatus='' lastFinalPhraseTime='' lastFinalPhraseRecognitionStatus='' lastFinalPhraseRecognitionLatency='' totalFinalPhraseCount='0' streamEndReason='' averageAudioFlowRate='85.12078886093737' reconnectCount='10' hasRecognitionDegraded='False' recognitionDegradedReason='' startTime='2022-09-02T09:56:41.1382953Z' endTime='2022-09-02T09:56:51.2364809Z' error='Grpc.Core.RpcException: Status(StatusCode="Unknown", Detail="Unexpected error in RPC handling")
    at Grpc.Core.Internal.ClientResponseStream2.MoveNext(CancellationToken token) at Shared.Grpc.Abstractions.ActiveRpcTrackingInterceptor.AsyncStreamReaderProxy1.MoveNext(CancellationToken cancellationToken)
    at Shared.Grpc.Net.Interceptors.InstrumentedInterceptor.AsyncStreamReaderProxy1.MoveNext(CancellationToken cancellationToken) at SpeechRecognition.Clients.Extensions.Decoder.Rpc.ResilientRecognizeAsyncDuplexStreamingCall.<>c__DisplayClass32_0.<<MoveNext>b__0>d.MoveNext() --- End of stack trace from previous location --- at Polly.Retry.AsyncRetryEngine.ImplementationAsync[TResult](Func3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates1 shouldRetryResultPredicates, Func5 onRetryAsync, Int32 permittedRetryCount, IEnumerable1 sleepDurationsEnumerable, Func4 sleepDurationProvider, Boolean continueOnCapturedContext)
    at Polly.AsyncPolicy.ExecuteAsyncTResult
    at SpeechRecognition.Clients.Extensions.Decoder.Rpc.ResilientRecognizeAsyncDuplexStreamingCall.MoveNext(CancellationToken cancellationToken)
    at SpeechRecognition.Clients.Decoder.Rpc.RpcRecognition1.HandleResponsesAsync(IAsyncStreamReader1 responseStream, CancellationToken cancellationToken)
    at SpeechRecognition.Clients.Extensions.Decoder.Rpc.ResilientRecognizeAsyncDuplexStreamingCall.CompleteAsync()
    at SpeechRecognition.Clients.Decoder.Rpc.RpcRecognition1.CompleteRecognitionRequestAsync() at SpeechRecognition.Clients.Decoder.Rpc.RpcRecognition1.CleanUpAsync(Exception e)' decoderLocale='zh-CN'
    info: SpeechToText[0]

    0 comments No comments