spx translate not storing results

Question

spx translate not storing results

User Sense 1

Command

spx translate --once --file audio.mp3 --format mp3 --source de-DE --target nl-NL  --output batch json --output batch file test.json

Result on screen

Connection CONNECTED...
TRANSLATING into 'nl': OK (from 'Okay')
TRANSLATING into 'nl': Oké, ik (from 'Okay, ich')
TRANSLATING into 'nl': Oké, ik ben met (from 'Okay, ich bin bei')
TRANSLATING into 'nl': Oké, ik ben met lampen (from 'Okay, ich bin bei Lampen')
TRANSLATING into 'nl': Oké, ik ben bij lampen en gloeit (from 'Okay, ich bin bei Lampen und leucht')
TRANSLATING into 'nl': Oké, ik hou van lampen en leuchter.cn (from 'Okay, ich bin bei Lampen und leuchter.cn')
TRANSLATING into 'nl': Oké, ik ben met lampen en leuchter.cn me (from 'Okay, ich bin bei Lampen und leuchter.cn ich')
TRANSLATED into 'nl': Oké, ik ben met lampen en leuchter.cn ik zou doen. (from 'Okay, ich bin bei Lampen und leuchter.cn, ich würde.')

SESSION STOPPED: 2f088491dd914ec9bbf27a594c35a8f1

Result in json file

{
  "AudioFileResults": [
    {
      "AudioFileName": "audio.mp3",
      "AudioFileUrl": "audio.mp3",
      "AudioLengthInSeconds": 7.3,
      "CombinedResults": [
        {
          "ChannelNumber": "0",
          "Lexical": "okay ich bin bei lampen und leuchter cn ich würde",
          "ITN": "Okay, ich bin bei Lampen und leuchter.cn, ich würde.",
          "MaskedITN": "Okay, ich bin bei Lampen und leuchter.cn, ich würde.",
          "Display": "Okay, ich bin bei Lampen und leuchter.cn, ich würde."
        }
      ],
      "SegmentResults": [
        {
          "RecognitionStatus": "Success",
          "ChannelNumber": "0",
          "SpeakerId": null,
          "Offset": 23700000,
          "Duration": 53000000,
          "OffsetInSeconds": 2.37,
          "DurationInSeconds": 5.3,
          "NBest": []
        }
      ]
    }
  ]
}

The translated text is not present in the test.json file. What am I missing?

romungi-MSFT 48,911 Reputation points Microsoft Employee Moderator

2022-05-23T07:03:12.317+00:00
@User Sense I think command is right, I have tried the same with microphone input and it seems to create a file. Does it work with microphone?
Are there any permission restrictions with creating a file in your machine?

spx translate --once --microphone --source en-US --target de --output batch json --output batch file my.output.json
User Sense 1 Reputation point

2022-05-23T10:58:05.963+00:00

Hello @romungi-MSFT ,

The current command does create a file, but the translation is missing. The content of the file is posted in my initial post (Result in json file). Only the transcription is available, but not the translation.

I'm unable to test is with microphone. I'm using the command in a server context, without a microfoon present.
romungi-MSFT 48,911 Reputation points Microsoft Employee Moderator

2022-05-23T11:52:53.55+00:00

@User Sense I understand the issue now and see similar behavior with microphone too. Let me check this with our product team internally and I will get back to you. Thanks!!

3 answers

Your answer

romungi-MSFT 48,911 Reputation points Microsoft Employee Moderator

2022-05-23T07:03:12.317+00:00

@User Sense I think command is right, I have tried the same with microphone input and it seems to create a file. Does it work with microphone?
Are there any permission restrictions with creating a file in your machine?

spx translate --once --microphone --source en-US --target de --output batch json --output batch file my.output.json
User Sense 1 Reputation point

2022-05-23T10:58:05.963+00:00

Hello @romungi-MSFT ,

The current command does create a file, but the translation is missing. The content of the file is posted in my initial post (Result in json file). Only the transcription is available, but not the translation.

I'm unable to test is with microphone. I'm using the command in a server context, without a microfoon present.
romungi-MSFT 48,911 Reputation points Microsoft Employee Moderator

2022-05-23T11:52:53.55+00:00

@User Sense I understand the issue now and see similar behavior with microphone too. Let me check this with our product team internally and I will get back to you. Thanks!!

Answer 1

romungi-MSFT 48,911 Microsoft Employee Moderator

@User Sense I can confirm that the above behavior was expected because the options used with the command for --output flag "batch json FILENAME" does not support translation and outputs the exact BATCH REST API json schema. It however created a .tsv file in the same directory and inside that file the translation information is available. If you want to get the translation text in JSON file you can follow this command format:

spx translate --once --microphone --source en-US --target de --output all file type json --output all file my.output3.json

Basically use all instead of batch.

More details about the output flags to print specific information is available as help in the tool.

spx help translate output  
spx help translate output all  
spx help translate output each  
spx help translate output examples

If an answer is helpful, please click on or upvote which might help other community members reading this thread.

User Sense 1

Hello @romungi-MSFT ,

This no solution.

Command:

spx translate --once --file audio.mp3 --format mp3 --source de-DE --target nl-NL  --output all file type json --output all file test.json

Result on screen

Connection CONNECTED...  
TRANSLATING into 'nl': OK (from 'Okay')  
TRANSLATING into 'nl': Oké, ik (from 'Okay, ich')  
TRANSLATING into 'nl': Oké, ik ben met (from 'Okay, ich bin bei')  
TRANSLATING into 'nl': Oké, ik ben met lampen (from 'Okay, ich bin bei Lampen')  
TRANSLATING into 'nl': Oké, ik ben bij lampen en gloeit (from 'Okay, ich bin bei Lampen und leucht')  
TRANSLATING into 'nl': Oké, ik ben met lampen en lichten op (from 'Okay, ich bin bei Lampen und leuchten')  
TRANSLATING into 'nl': Oké, ik hou van lampen en leuchter.cn (from 'Okay, ich bin bei Lampen und leuchter.cn')  
TRANSLATING into 'nl': Oké, ik ben met lampen en leuchter.cn me (from 'Okay, ich bin bei Lampen und leuchter.cn ich')  
TRANSLATING into 'nl': Oké, ik ben met lampen en leuchter.cn ik zou doen (from 'Okay, ich bin bei Lampen und leuchter.cn, ich würde')  
TRANSLATED into 'nl': Oké, ik ben met lampen en leuchter.cn ik zou doen. (from 'Okay, ich bin bei Lampen und leuchter.cn, ich würde.')  
  
SESSION STOPPED: 0243e9fe7e854399883fc6c8d8a2bd0f

Result in test.json

{  
  "audio.input.id": "audio",  
  "recognizer.session.started.sessionid": "0243e9fe7e854399883fc6c8d8a2bd0f",  
  "recognizer.recognized.result.text": "Okay, ich bin bei Lampen und leuchter.cn, ich würde."  
}

As you can see, still no translation. I also need the additional timing information.

romungi-MSFT 48,911 Reputation points Microsoft Employee Moderator

2022-05-24T13:41:16.463+00:00
@User Sense Do you need to output SegmentResults information also along with translated text as seen in the batch type output?
I think that part of the result is not available or supported as of now since the help indicates the same.

spx help translate output batch SPX - Azure Speech CLI, Version 1.21.0 Copyright (c) 2020 Microsoft Corporation. All Rights Reserved. translate output batch NOT YET WRITTEN

Answer 2

@romungi-MSFT We are planning to provide our customers with a translated transcription based on the audio file. This in combination with the orginial transcription in the language of the audio file (spx synthesize)

Just the translation without a time reference isn't very helpful.

It is odd that it is not possible to store the translated text into a file (json). Both batch and file approach are not containing the translation. A bit odd if you ask me as this is the purpose of the translate command. Even the tsv file is missing the translation.

/root/.dotnet/tools/spx translate --once --file audio.mp3 --format mp3 --source de-DE --target nl-NL

On screen

SESSION STARTED: ffa365b781f549df829df745562518be  
  
Connection CONNECTED...  
TRANSLATING into 'nl': OK (from 'Okay')  
...  
TRANSLATING into 'nl': Oké, ik ben met lampen en leuchter.cn me (from 'Okay, ich bin bei Lampen und leuchter.cn ich')  
TRANSLATING into 'nl': Oké, ik ben met lampen en leuchter.cn ik zou doen (from 'Okay, ich bin bei Lampen und leuchter.cn, ich würde')  
TRANSLATED into 'nl': Oké, ik ben met lampen en leuchter.cn ik zou doen. (from 'Okay, ich bin bei Lampen und leuchter.cn, ich würde.')  
  
SESSION STOPPED: ffa365b781f549df829df745562518be

tsv file

audio.input.id  recognizer.session.started.sessionid    recognizer.recognized.result.text  
audio   ffa365b781f549df829df745562518be        Okay, ich bin bei Lampen und leuchter.cn, ich würde.

Only the original transcription, no translation. Even the commands that should work do not work as expected.

Answer 3

User Sense 1

I guess I will stick to spx translate --once --source de-DE --target nl --file audio.mp3 --format any --output vtt file caption.vtt for now.

This results in a valid caption.nl.vtt file

Another issue with this command is that

spx translate --once --source de-DE --target nl-NL --file audio.mp3 --format any --output vtt file caption.vtt

Results in the file caption.nl-NL.vtt, but is empty. Translation is shown on the screen but not exported :-(

Share via

spx translate not storing results

3 answers

Your answer