Použití rozhraní API pro rychlý přepis s využitím služby Azure AI Speech

Článek
11/12/2024

Rozhraní API pro rychlý přepis se používá k přepisu zvukových souborů s synchronním a rychlejším vrácením výsledků než v reálném čase. Používejte rychlý přepis ve scénářích, kdy potřebujete co nejrychleji přepis zvukového záznamu s předvídatelnou latencí, například:

Rychlý přepis zvuku nebo videa, titulky a úpravy
Překlad videa

Na rozdíl od dávkového rozhraní API pro přepis vytvoří rozhraní API pro rychlý přepis pouze přepisy ve formuláři zobrazení (ne lexikální). Formulář zobrazení je čitelnější formou přepisu, která zahrnuje interpunkci a velká písmena.

Požadavky

Prostředek Azure AI Speech v jedné z oblastí, ve kterých je k dispozici rozhraní API pro rychlý přepis. Podporované oblasti jsou: Austrálie – východ, Brazílie – jih, Indie – střed, USA – východ, USA – východ 2, Francie – střed, Japonsko – východ, USA – středosever, Severní Evropa, USA – středojiž, Jihovýchodní Asie, Švédsko – střed, Západní Evropa, USA – západ, USA – západ 2, USA – západ 3. Další informace o oblastech podporovaných pro jiné funkce služby Speech najdete v tématu Oblasti služby Speech.
Zvukový soubor (kratší než 2 hodiny a velikost menší než 200 MB) v jednom z formátů a kodeků podporovaných rozhraním API pro dávkový přepis. Další informace o podporovaných formátech zvuku najdete v tématu Podporované formáty zvuku.

Použití rozhraní API pro rychlý přepis

Tip

Vyzkoušejte si rychlý přepis v Azure AI Studiu.

Naučíme se používat rozhraní API pro rychlé přepisy (prostřednictvím přepisů – přepis) s následujícími scénáři:

Zadané známé národní prostředí: Přepis zvukového souboru se zadaným národním prostředím. Pokud znáte národní prostředí zvukového souboru, můžete ho určit, aby se zlepšila přesnost přepisu a minimalizovala se latence.
Identifikace jazyka: Přepis zvukového souboru s identifikací jazyka. Pokud si nejste jistí národním prostředím zvukového souboru, můžete zapnout identifikaci jazyka a nechat službu Speech identifikovat národní prostředí.
Diarizace zapnutá: Přepis zvukového souboru s diarizací. Diarizace rozlišuje mezi různými mluvčími v konverzaci. Služba Speech poskytuje informace o tom, o kterém mluvčím mluvil konkrétní část přepisované řeči.
Vícekanálový: Přepis zvukového souboru, který má jeden nebo dva kanály. Přepisy ve více kanálech jsou užitečné pro zvukové soubory s více kanály, například zvukové soubory s více reproduktory nebo zvukovými soubory s šumem na pozadí. Ve výchozím nastavení rozhraní API pro rychlé přepisy sloučí všechny vstupní kanály do jednoho kanálu a pak provede přepis. Pokud to není žádoucí, kanály je možné přepisovat nezávisle bez sloučení.

Vytvořte požadavek POST s více částmi nebo daty formulářů do koncového transcriptions bodu se zvukovým souborem a vlastnostmi textu požadavku.

Následující příklad ukazuje, jak přepsat zvukový soubor se zadaným národním prostředím. Pokud znáte národní prostředí zvukového souboru, můžete ho určit, aby se zlepšila přesnost přepisu a minimalizovala se latence.

Nahraďte YourSubscriptionKey klíčem prostředku služby Speech.
Nahraďte YourServiceRegion oblastí prostředků služby Speech.
Nahraďte YourAudioFile cestou ke zvukovému souboru.

curl --location 'https://YourServiceRegion.api.cognitive.microsoft.com/speechtotext/transcriptions:transcribe?api-version=2024-11-15' \
--header 'Content-Type: multipart/form-data' \
--header 'Ocp-Apim-Subscription-Key: YourSubscriptionKey' \
--form 'audio=@"YourAudioFile"' \
--form 'definition="{
    "locales":["en-US"]}"'

Definici formuláře vytvořte podle následujících pokynů:

Nastavte volitelnou (ale doporučenou) locales vlastnost, která by měla odpovídat očekávanému národnímu prostředí zvukových dat pro přepis. V tomto příkladu je národní prostředí nastaveno na en-UShodnotu . Podporovaná národní prostředí, která můžete zadat, jsou: de-DE, en-IN, en-US, es-ES, es-MX, fr-FR, hi-IN, it-IT, ja-JP, ko-KR, pt-BR a zh-CN.

Další informace o locales rozhraní API pro rychlý přepis a další vlastnosti najdete v části Možnosti konfigurace požadavků dále v této příručce.

Odpověď zahrnuje durationMilliseconds, offsetMillisecondsa další. Vlastnost combinedPhrases obsahuje úplné přepisy pro všechny mluvčí.

{
	"durationMilliseconds": 182439,
	"combinedPhrases": [
		{
			"text": "Good afternoon. This is Sam. Thank you for calling Contoso. How can I help? Hi there. My name is Mary. I'm currently living in Los Angeles, but I'm planning to move to Las Vegas. I would like to apply for a loan. Okay. I see you're currently living in California. Let me make sure I understand you correctly. Uh You'd like to apply for a loan even though you'll be moving soon. Is that right? Yes, exactly. So I'm planning to relocate soon, but I would like to apply for the loan first so that I can purchase a new home once I move there. And are you planning to sell your current home? Yes, I will be listing it on the market soon and hopefully it'll sell quickly. That's why I'm applying for a loan now, so that I can purchase a new house in Nevada and close on it quickly as well once my current home sells. I see. Would you mind holding for a moment while I take your information down? Yeah, no problem. Thank you for your help. Mm-hmm. Just one moment. All right. Thank you for your patience, ma'am. May I have your first and last name, please? Yes, my name is Mary Smith. Thank you, Ms. Smith. May I have your current address, please? Yes. So my address is 123 Main Street in Los Angeles, California, and the zip code is 90923. Sorry, that was a 90 what? 90923. 90923 on Main Street. Got it. Thank you. May I have your phone number as well, please? Uh Yes, my phone number is 504-529-2351 and then yeah. 2351. Got it. And do you have an e-mail address we I can associate with this application? uh Yes, so my e-mail address is mary.a.sm78@gmail.com. Mary.a, was that a S-N as in November or M as in Mike? M as in Mike. Mike78, got it. Thank you. Ms. Smith, do you currently have any other loans? Uh Yes, so I currently have two other loans through Contoso. So my first one is my car loan and then my other is my student loan. They total about 1400 per month combined and my interest rate is 8%. I see. And you're currently paying those loans off monthly, is that right? Yes, of course I do. OK, thank you. Here's what I suggest we do. Let me place you on a brief hold again so that I can talk with one of our loan officers and get this started for you immediately. In the meantime, it would be great if you could take a few minutes and complete the remainder of the secure application online at www.contosoloans.com. Yeah, that sounds good. I can go ahead and get started. Thank you for your help. Thank you."
		}
	],
	"phrases": [
		{
			"offsetMilliseconds": 960,
			"durationMilliseconds": 640,
			"text": "Good afternoon.",
			"words": [
				{
					"text": "Good",
					"offsetMilliseconds": 960,
					"durationMilliseconds": 240
				},
				{
					"text": "afternoon.",
					"offsetMilliseconds": 1200,
					"durationMilliseconds": 400
				}
			],
			"locale": "en-US",
			"confidence": 0.93616915
		},
		{
			"offsetMilliseconds": 1600,
			"durationMilliseconds": 640,
			"text": "This is Sam.",
			"words": [
				{
					"text": "This",
					"offsetMilliseconds": 1600,
					"durationMilliseconds": 240
				},
				{
					"text": "is",
					"offsetMilliseconds": 1840,
					"durationMilliseconds": 120
				},
				{
					"text": "Sam.",
					"offsetMilliseconds": 1960,
					"durationMilliseconds": 280
				}
			],
			"locale": "en-US",
			"confidence": 0.93616915
		},
		{
			"offsetMilliseconds": 2240,
			"durationMilliseconds": 1040,
			"text": "Thank you for calling Contoso.",
			"words": [
				{
					"text": "Thank",
					"offsetMilliseconds": 2240,
					"durationMilliseconds": 200
				},
				{
					"text": "you",
					"offsetMilliseconds": 2440,
					"durationMilliseconds": 80
				},
				{
					"text": "for",
					"offsetMilliseconds": 2520,
					"durationMilliseconds": 120
				},
				{
					"text": "calling",
					"offsetMilliseconds": 2640,
					"durationMilliseconds": 200
				},
				{
					"text": "Contoso.",
					"offsetMilliseconds": 2840,
					"durationMilliseconds": 440
				}
			],
			"locale": "en-US",
			"confidence": 0.93616915
		},
		{
			"offsetMilliseconds": 3280,
			"durationMilliseconds": 640,
			"text": "How can I help?",
			"words": [
				{
					"text": "How",
					"offsetMilliseconds": 3280,
					"durationMilliseconds": 120
				},
				{
					"text": "can",
					"offsetMilliseconds": 3440,
					"durationMilliseconds": 120
				},
				{
					"text": "I",
					"offsetMilliseconds": 3560,
					"durationMilliseconds": 40
				},
				{
					"text": "help?",
					"offsetMilliseconds": 3600,
					"durationMilliseconds": 320
				}
			],
			"locale": "en-US",
			"confidence": 0.93616915
		},
		{
			"offsetMilliseconds": 5040,
			"durationMilliseconds": 400,
			"text": "Hi there.",
			"words": [
				{
					"text": "Hi",
					"offsetMilliseconds": 5040,
					"durationMilliseconds": 240
				},
				{
					"text": "there.",
					"offsetMilliseconds": 5280,
					"durationMilliseconds": 160
				}
			],
			"locale": "en-US",
			"confidence": 0.93616915
		},
		{
			"offsetMilliseconds": 5440,
			"durationMilliseconds": 800,
			"text": "My name is Mary.",
			"words": [
				{
					"text": "My",
					"offsetMilliseconds": 5440,
					"durationMilliseconds": 80
				},
				{
					"text": "name",
					"offsetMilliseconds": 5520,
					"durationMilliseconds": 120
				},
				{
					"text": "is",
					"offsetMilliseconds": 5640,
					"durationMilliseconds": 80
				},
				{
					"text": "Mary.",
					"offsetMilliseconds": 5720,
					"durationMilliseconds": 520
				}
			],
			"locale": "en-US",
			"confidence": 0.93616915
		},
		// More transcription results...
	    // Redacted for brevity
		{
			"offsetMilliseconds": 180320,
			"durationMilliseconds": 680,
			"text": "Thank you for your help.",
			"words": [
				{
					"text": "Thank",
					"offsetMilliseconds": 180320,
					"durationMilliseconds": 160
				},
				{
					"text": "you",
					"offsetMilliseconds": 180480,
					"durationMilliseconds": 80
				},
				{
					"text": "for",
					"offsetMilliseconds": 180560,
					"durationMilliseconds": 120
				},
				{
					"text": "your",
					"offsetMilliseconds": 180680,
					"durationMilliseconds": 120
				},
				{
					"text": "help.",
					"offsetMilliseconds": 180800,
					"durationMilliseconds": 200
				}
			],
			"locale": "en-US",
			"confidence": 0.9314801
		},
		{
			"offsetMilliseconds": 181960,
			"durationMilliseconds": 280,
			"text": "Thank you.",
			"words": [
				{
					"text": "Thank",
					"offsetMilliseconds": 181960,
					"durationMilliseconds": 200
				},
				{
					"text": "you.",
					"offsetMilliseconds": 182160,
					"durationMilliseconds": 80
				}
			],
			"locale": "en-US",
			"confidence": 0.9314801
		}
	]
}

Vytvořte požadavek POST s více částmi nebo daty formulářů do koncového transcriptions bodu se zvukovým souborem a vlastnostmi textu požadavku.

Následující příklad ukazuje, jak přepsat zvukový soubor s identifikací jazyka. Pokud si nejste jistí národním prostředím, můžete zadat více národních prostředí. Pokud nezadáte žádné národní prostředí nebo pokud zadaná národní prostředí nejsou ve zvukovém souboru, služba Speech se pokusí identifikovat národní prostředí.

Nahraďte YourSubscriptionKey klíčem prostředku služby Speech.
Nahraďte YourServiceRegion oblastí prostředků služby Speech.
Nahraďte YourAudioFile cestou ke zvukovému souboru.

curl --location 'https://YourServiceRegion.api.cognitive.microsoft.com/speechtotext/transcriptions:transcribe?api-version=2024-11-15' \
--header 'Content-Type: multipart/form-data' \
--header 'Ocp-Apim-Subscription-Key: YourSubscriptionKey' \
--form 'audio=@"YourAudioFile"' \
--form 'definition="{
    "locales":["en-US","ja-JP"]}"'

Definici formuláře vytvořte podle následujících pokynů:

Nastavte volitelnou (ale doporučenou) locales vlastnost, která by měla odpovídat očekávanému národnímu prostředí zvukových dat pro přepis. V tomto příkladu jsou národní prostředí nastavena na en-US a ja-JP. Podporovaná národní prostředí, která můžete zadat, jsou: de-DE, en-IN, en-US, es-ES, es-MX, fr-FR, hi-IN, it-IT, ja-JP, ko-KR, pt-BR a zh-CN.

Další informace o locales rozhraní API pro rychlý přepis a další vlastnosti najdete v části Možnosti konfigurace požadavků dále v této příručce.

Odpověď zahrnuje durationMilliseconds, offsetMillisecondsa další. Vlastnost combinedPhrases obsahuje úplné přepisy pro všechny mluvčí.

{
	"durationMilliseconds": 185079,
	"combinedPhrases": [
		{
			"text": "Hello, thank you for calling Contoso. Who am I speaking with today? Hi, my name is Mary Rondo. I'm trying to enroll myself with Contoso. Hi, Mary. Are you calling because you need health insurance? Yes. Yeah, I'm calling to sign up for insurance. Great. Uh If you can answer a few questions, we can get you signed up in a Jiffy. Okay. So what's your full name? uh So Mary Beth Rondo, last name is R like Romeo, O like Ocean, N like Nancy D, D like Dog, and O like Ocean again. Rondo. Got it. And what's the best callback number in case we get disconnected? I only have a cell phone, so I can give you that. Yep, that'll be fine. Sure. So it's 234-554 and then 9312. Got it. So to confirm, it's 234-554-9312. Yep, that's right. Excellent. Let's get some additional information for your application. Do you have a job? Uh Yes, I am self-employed. Okay, so then you have a social security number as well? Uh Yes, I do. Okay, and what is your social security number, please? Uh Sure, so it's 412-253-4931. 6789. Sorry, was that a 25 or a 225? You cut out for a bit. It's double two, so 412, then another two, then five. Thank you so much. And could I have your e-mail address, please? Yeah, it's maryrondo@gmail.com. So my first and last name at gmail.com. No periods, no dashes. Great. Uh That is the last question. So let me take your information and I'll be able to get you signed up right away. Thank you for calling Contoso and I'll be able to get you signed up immediately. One of our agents will call you back in about 24 hours or so to confirm your application. That sounds good. Thank you. Absolutely. If you need anything else, please give us a call at 1-800-555-5564, extension 123. Thank you very much for calling Contoso. Actually, so I have one more question. Yes, of course. I'm curious, will I be getting a physical card as proof of coverage? So the default is a digital membership card, but we can send you a physical card if you prefer. Uh Yes. Could you please mail it to me when it's ready? I'd like to have it shipped to, are you ready for my address? Uh Yeah. uh So it's 2660 Unit A on Maple Avenue, Southeast Lansing, and then zip code is 48823. Absolutely. I've made a note on your file. Awesome. Thanks so much. You're very welcome. Thank you for calling Contoso and have a great day."
		}
	],
	"phrases": [
		{
			"offsetMilliseconds": 720,
			"durationMilliseconds": 1600,
			"text": "Hello, thank you for calling Contoso.",
			"words": [
				{
					"text": "Hello,",
					"offsetMilliseconds": 720,
					"durationMilliseconds": 480
				},
				{
					"text": "thank",
					"offsetMilliseconds": 1200,
					"durationMilliseconds": 200
				},
				{
					"text": "you",
					"offsetMilliseconds": 1400,
					"durationMilliseconds": 80
				},
				{
					"text": "for",
					"offsetMilliseconds": 1480,
					"durationMilliseconds": 120
				},
				{
					"text": "calling",
					"offsetMilliseconds": 1600,
					"durationMilliseconds": 240
				},
				{
					"text": "Contoso.",
					"offsetMilliseconds": 1840,
					"durationMilliseconds": 480
				}
			],
			"locale": "en-US",
			"confidence": 0.93265927
		},
		{
			"offsetMilliseconds": 2320,
			"durationMilliseconds": 1120,
			"text": "Who am I speaking with today?",
			"words": [
				{
					"text": "Who",
					"offsetMilliseconds": 2320,
					"durationMilliseconds": 160
				},
				{
					"text": "am",
					"offsetMilliseconds": 2480,
					"durationMilliseconds": 80
				},
				{
					"text": "I",
					"offsetMilliseconds": 2560,
					"durationMilliseconds": 80
				},
				{
					"text": "speaking",
					"offsetMilliseconds": 2640,
					"durationMilliseconds": 320
				},
				{
					"text": "with",
					"offsetMilliseconds": 2960,
					"durationMilliseconds": 160
				},
				{
					"text": "today?",
					"offsetMilliseconds": 3120,
					"durationMilliseconds": 320
				}
			],
			"locale": "en-US",
			"confidence": 0.93265927
		},
		{
			"offsetMilliseconds": 4480,
			"durationMilliseconds": 1600,
			"text": "Hi, my name is Mary Rondo.",
			"words": [
				{
					"text": "Hi,",
					"offsetMilliseconds": 4480,
					"durationMilliseconds": 400
				},
				{
					"text": "my",
					"offsetMilliseconds": 4880,
					"durationMilliseconds": 120
				},
				{
					"text": "name",
					"offsetMilliseconds": 5000,
					"durationMilliseconds": 120
				},
				{
					"text": "is",
					"offsetMilliseconds": 5120,
					"durationMilliseconds": 160
				},
				{
					"text": "Mary",
					"offsetMilliseconds": 5280,
					"durationMilliseconds": 240
				},
				{
					"text": "Rondo.",
					"offsetMilliseconds": 5520,
					"durationMilliseconds": 560
				}
			],
			"locale": "en-US",
			"confidence": 0.93265927
		},
		{
			"offsetMilliseconds": 6120,
			"durationMilliseconds": 1800,
			"text": "I'm trying to enroll myself with Contoso.",
			"words": [
				{
					"text": "I'm",
					"offsetMilliseconds": 6120,
					"durationMilliseconds": 120
				},
				{
					"text": "trying",
					"offsetMilliseconds": 6240,
					"durationMilliseconds": 200
				},
				{
					"text": "to",
					"offsetMilliseconds": 6440,
					"durationMilliseconds": 80
				},
				{
					"text": "enroll",
					"offsetMilliseconds": 6520,
					"durationMilliseconds": 200
				},
				{
					"text": "myself",
					"offsetMilliseconds": 6720,
					"durationMilliseconds": 360
				},
				{
					"text": "with",
					"offsetMilliseconds": 7080,
					"durationMilliseconds": 120
				},
				{
					"text": "Contoso.",
					"offsetMilliseconds": 7200,
					"durationMilliseconds": 720
				}
			],
			"locale": "en-US",
			"confidence": 0.93265927
		},
		// More transcription results...
	    // Redacted for brevity
		{
			"offsetMilliseconds": 181520,
			"durationMilliseconds": 720,
			"text": "You're very welcome.",
			"words": [
				{
					"text": "You're",
					"offsetMilliseconds": 181520,
					"durationMilliseconds": 160
				},
				{
					"text": "very",
					"offsetMilliseconds": 181680,
					"durationMilliseconds": 200
				},
				{
					"text": "welcome.",
					"offsetMilliseconds": 181880,
					"durationMilliseconds": 360
				}
			],
			"locale": "en-US",
			"confidence": 0.90571773
		},
		{
			"offsetMilliseconds": 182320,
			"durationMilliseconds": 1840,
			"text": "Thank you for calling Contoso and have a great day.",
			"words": [
				{
					"text": "Thank",
					"offsetMilliseconds": 182320,
					"durationMilliseconds": 200
				},
				{
					"text": "you",
					"offsetMilliseconds": 182520,
					"durationMilliseconds": 80
				},
				{
					"text": "for",
					"offsetMilliseconds": 182600,
					"durationMilliseconds": 120
				},
				{
					"text": "calling",
					"offsetMilliseconds": 182720,
					"durationMilliseconds": 280
				},
				{
					"text": "Contoso",
					"offsetMilliseconds": 183000,
					"durationMilliseconds": 520
				},
				{
					"text": "and",
					"offsetMilliseconds": 183520,
					"durationMilliseconds": 160
				},
				{
					"text": "have",
					"offsetMilliseconds": 183680,
					"durationMilliseconds": 120
				},
				{
					"text": "a",
					"offsetMilliseconds": 183800,
					"durationMilliseconds": 40
				},
				{
					"text": "great",
					"offsetMilliseconds": 183840,
					"durationMilliseconds": 200
				},
				{
					"text": "day.",
					"offsetMilliseconds": 184040,
					"durationMilliseconds": 120
				}
			],
			"locale": "en-US",
			"confidence": 0.90571773
		}
	]
}

Vytvořte požadavek POST s více částmi nebo daty formulářů do koncového transcriptions bodu se zvukovým souborem a vlastnostmi textu požadavku.

Následující příklad ukazuje, jak přepsat zvukový soubor s povolenou diarizací. Diarizace rozlišuje mezi různými mluvčími v konverzaci. Služba Speech poskytuje informace o tom, o kterém mluvčím mluvil konkrétní část přepisované řeči.

Nahraďte YourSubscriptionKey klíčem prostředku služby Speech.
Nahraďte YourServiceRegion oblastí prostředků služby Speech.
Nahraďte YourAudioFile cestou ke zvukovému souboru.

curl --location 'https://YourServiceRegion.api.cognitive.microsoft.com/speechtotext/transcriptions:transcribe?api-version=2024-11-15' \
--header 'Content-Type: multipart/form-data' \
--header 'Ocp-Apim-Subscription-Key: YourSubscriptionKey' \
--form 'audio=@"YourAudioFile"' \
--form 'definition="{
    "locales":["en-US"], 
    "diarization": {"maxSpeakers": 2,"enabled": true}}"'

Definici formuláře vytvořte podle následujících pokynů:

Nastavte volitelnou (ale doporučenou) locales vlastnost, která by měla odpovídat očekávanému národnímu prostředí zvukových dat pro přepis. V tomto příkladu je národní prostředí nastaveno na en-UShodnotu . Podporovaná národní prostředí, která můžete zadat, jsou: de-DE, en-IN, en-US, es-ES, es-MX, fr-FR, hi-IN, it-IT, ja-JP, ko-KR, pt-BR a zh-CN.
Nastavte vlastnost tak diarization , aby rozpoznala a oddělil více reproduktorů v jednom zvukovém kanálu. Zadejte například "diarization": {"maxSpeakers": 2, "enabled": true}. Soubor přepisu pak obsahuje speaker položky pro každou přepisovanou frázi.

Další informace o localesrozhraní diarizationAPI pro rychlé přepisy a další vlastnosti najdete v části Možnosti konfigurace požadavků dále v tomto průvodci.

Odpověď zahrnuje durationMilliseconds, offsetMillisecondsa další. V tomto příkladu je diarizace povolená, takže odpověď obsahuje speaker informace pro každou přepisovanou frázi. Vlastnost combinedPhrases obsahuje úplné přepisy pro všechny mluvčí v jednom kanálu.

{
	"durationMilliseconds": 182439,
	"combinedPhrases": [
		{
			"channel": 0,
			"text": "Good afternoon. This is Sam. Thank you for calling Contoso. How can I help? Hi there. My name is Mary. I'm currently living in Los Angeles, but I'm planning to move to Las Vegas. I would like to apply for a loan. Okay. I see you're currently living in California. Let me make sure I understand you correctly. Uh You'd like to apply for a loan even though you'll be moving soon. Is that right? Yes, exactly. So I'm planning to relocate soon, but I would like to apply for the loan first so that I can purchase a new home once I move there. And are you planning to sell your current home? Yes, I will be listing it on the market soon and hopefully it'll sell quickly. That's why I'm applying for a loan now, so that I can purchase a new house in Nevada and close on it quickly as well once my current home sells. I see. Would you mind holding for a moment while I take your information down? Yeah, no problem. Thank you for your help. Mm-hmm. Just one moment. All right. Thank you for your patience, ma'am. May I have your first and last name, please? Yes, my name is Mary Smith. Thank you, Ms. Smith. May I have your current address, please? Yes. So my address is 123 Main Street in Los Angeles, California, and the zip code is 90923. Sorry, that was a 90 what? 90923. 90923 on Main Street. Got it. Thank you. May I have your phone number as well, please? Uh. Yes, my phone number is 504-529-2351 and then yeah. 2351. Got it. And do you have an e-mail address we I can associate with this application? Uh Yes, so my e-mail address is mary.a.sm78@gmail.com. Mary.a, was that a S-N as in November or M as in Mike? M as in Mike. Mike78, got it. Thank you. Ms. Smith, do you currently have any other loans? Uh Yes, so I currently have two other loans through Contoso. So my first one is my car loan and then my other is my student loan. They total about 1400 per month combined and my interest rate is 8%. I see. And. You're currently paying those loans off monthly, is that right? Yes, of course I do. OK, thank you. Here's what I suggest we do. Let me place you on a brief hold again so that I can talk with one of our loan officers and get this started for you immediately. In the meantime, it would be great if you could take a few minutes and complete the remainder of the secure application online at www.contosoloans.com. Yeah, that sounds good. I can go ahead and get started. Thank you for your help. Thank you."
		}
	],
	"phrases": [
		{
			"channel": 0,
			"speaker": 1,
			"offsetMilliseconds": 960,
			"durationMilliseconds": 640,
			"text": "Good afternoon.",
			"words": [
				{
					"text": "Good",
					"offsetMilliseconds": 960,
					"durationMilliseconds": 240
				},
				{
					"text": "afternoon.",
					"offsetMilliseconds": 1200,
					"durationMilliseconds": 400
				}
			],
			"locale": "en-US",
			"confidence": 0.93616915
		},
		{
			"channel": 0,
			"speaker": 1,
			"offsetMilliseconds": 1600,
			"durationMilliseconds": 640,
			"text": "This is Sam.",
			"words": [
				{
					"text": "This",
					"offsetMilliseconds": 1600,
					"durationMilliseconds": 240
				},
				{
					"text": "is",
					"offsetMilliseconds": 1840,
					"durationMilliseconds": 120
				},
				{
					"text": "Sam.",
					"offsetMilliseconds": 1960,
					"durationMilliseconds": 280
				}
			],
			"locale": "en-US",
			"confidence": 0.93616915
		},
		{
			"channel": 0,
			"speaker": 1,
			"offsetMilliseconds": 2240,
			"durationMilliseconds": 1040,
			"text": "Thank you for calling Contoso.",
			"words": [
				{
					"text": "Thank",
					"offsetMilliseconds": 2240,
					"durationMilliseconds": 200
				},
				{
					"text": "you",
					"offsetMilliseconds": 2440,
					"durationMilliseconds": 80
				},
				{
					"text": "for",
					"offsetMilliseconds": 2520,
					"durationMilliseconds": 120
				},
				{
					"text": "calling",
					"offsetMilliseconds": 2640,
					"durationMilliseconds": 200
				},
				{
					"text": "Contoso.",
					"offsetMilliseconds": 2840,
					"durationMilliseconds": 440
				}
			],
			"locale": "en-US",
			"confidence": 0.93616915
		},
		{
			"channel": 0,
			"speaker": 1,
			"offsetMilliseconds": 3280,
			"durationMilliseconds": 640,
			"text": "How can I help?",
			"words": [
				{
					"text": "How",
					"offsetMilliseconds": 3280,
					"durationMilliseconds": 120
				},
				{
					"text": "can",
					"offsetMilliseconds": 3440,
					"durationMilliseconds": 120
				},
				{
					"text": "I",
					"offsetMilliseconds": 3560,
					"durationMilliseconds": 40
				},
				{
					"text": "help?",
					"offsetMilliseconds": 3600,
					"durationMilliseconds": 320
				}
			],
			"locale": "en-US",
			"confidence": 0.93616915
		},
		{
			"channel": 0,
			"speaker": 0,
			"offsetMilliseconds": 5040,
			"durationMilliseconds": 400,
			"text": "Hi there.",
			"words": [
				{
					"text": "Hi",
					"offsetMilliseconds": 5040,
					"durationMilliseconds": 240
				},
				{
					"text": "there.",
					"offsetMilliseconds": 5280,
					"durationMilliseconds": 160
				}
			],
			"locale": "en-US",
			"confidence": 0.93616915
		},
		{
			"channel": 0,
			"speaker": 0,
			"offsetMilliseconds": 5440,
			"durationMilliseconds": 800,
			"text": "My name is Mary.",
			"words": [
				{
					"text": "My",
					"offsetMilliseconds": 5440,
					"durationMilliseconds": 80
				},
				{
					"text": "name",
					"offsetMilliseconds": 5520,
					"durationMilliseconds": 120
				},
				{
					"text": "is",
					"offsetMilliseconds": 5640,
					"durationMilliseconds": 80
				},
				{
					"text": "Mary.",
					"offsetMilliseconds": 5720,
					"durationMilliseconds": 520
				}
			],
			"locale": "en-US",
			"confidence": 0.93616915
		},
		// More transcription results...
	    // Redacted for brevity
		{
			"channel": 0,
			"speaker": 0,
			"offsetMilliseconds": 180320,
			"durationMilliseconds": 680,
			"text": "Thank you for your help.",
			"words": [
				{
					"text": "Thank",
					"offsetMilliseconds": 180320,
					"durationMilliseconds": 160
				},
				{
					"text": "you",
					"offsetMilliseconds": 180480,
					"durationMilliseconds": 80
				},
				{
					"text": "for",
					"offsetMilliseconds": 180560,
					"durationMilliseconds": 120
				},
				{
					"text": "your",
					"offsetMilliseconds": 180680,
					"durationMilliseconds": 120
				},
				{
					"text": "help.",
					"offsetMilliseconds": 180800,
					"durationMilliseconds": 200
				}
			],
			"locale": "en-US",
			"confidence": 0.9314801
		},
		{
			"channel": 0,
			"speaker": 1,
			"offsetMilliseconds": 181960,
			"durationMilliseconds": 280,
			"text": "Thank you.",
			"words": [
				{
					"text": "Thank",
					"offsetMilliseconds": 181960,
					"durationMilliseconds": 200
				},
				{
					"text": "you.",
					"offsetMilliseconds": 182160,
					"durationMilliseconds": 80
				}
			],
			"locale": "en-US",
			"confidence": 0.9314801
		}
    ]
}

Vytvořte požadavek POST s více částmi nebo daty formulářů do koncového transcriptions bodu se zvukovým souborem a vlastnostmi textu požadavku.

Následující příklad ukazuje, jak přepsat zvukový soubor, který má jeden nebo dva kanály. Přepisy ve více kanálech jsou užitečné pro zvukové soubory s více kanály, například zvukové soubory s více reproduktory nebo zvukovými soubory s šumem na pozadí. Ve výchozím nastavení rozhraní API pro rychlé přepisy sloučí všechny vstupní kanály do jednoho kanálu a pak provede přepis. Pokud to není žádoucí, kanály je možné přepisovat nezávisle bez sloučení.

Nahraďte YourSubscriptionKey klíčem prostředku služby Speech.
Nahraďte YourServiceRegion oblastí prostředků služby Speech.
Nahraďte YourAudioFile cestou ke zvukovému souboru.

curl --location 'https://YourServiceRegion.api.cognitive.microsoft.com/speechtotext/transcriptions:transcribe?api-version=2024-11-15' \
--header 'Content-Type: multipart/form-data' \
--header 'Ocp-Apim-Subscription-Key: YourSubscriptionKey' \
--form 'audio=@"YourAudioFile"' \
--form 'definition="{
    "locales":["en-US"], 
    "channels": [0,1]}"'

Definici formuláře vytvořte podle následujících pokynů:

Nastavte volitelnou (ale doporučenou) locales vlastnost, která by měla odpovídat očekávanému národnímu prostředí zvukových dat pro přepis. V tomto příkladu je národní prostředí nastaveno na en-UShodnotu . Podporovaná národní prostředí, která můžete zadat, jsou: de-DE, en-IN, en-US, es-ES, es-MX, fr-FR, hi-IN, it-IT, ja-JP, ko-KR, pt-BR a zh-CN.
channels Nastavte vlastnost tak, aby se určily indexy založené na nule kanálů, které se mají přepisovat samostatně. Pokud není povolená diarizace, podporují se až dva kanály. V tomto příkladu jsou zadány kanály 0 a 1.

Další informace o localesrozhraní channelsAPI pro rychlé přepisy a další vlastnosti najdete v části Možnosti konfigurace požadavků dále v tomto průvodci.

Odpověď zahrnuje durationMilliseconds, offsetMillisecondsa další. Vlastnost channel identifikuje kanál, pokud zvukový soubor obsahuje více kanálů. Vlastnost combinedPhrases obsahuje úplné přepisy oddělené pro každý zvukový kanál. "channel": 0,"text" Vyhledejte a "channel": 1,"text" identifikujte úplné přepisy jednotlivých kanálů.

{
	"durationMilliseconds": 185079,
	"combinedPhrases": [
		{
			"channel": 0,
			"text": "Hello. Thank you for calling Contoso. Who am I speaking with today? Hi, Mary. Are you calling because you need health insurance? Great. If you can answer a few questions, we can get you signed up in the Jiffy. So what's your full name? Got it. And what's the best callback number in case we get disconnected? Yep, that'll be fine. Got it. So to confirm, it's 234-554-9312. Excellent. Let's get some additional information for your application. Do you have a job? OK, so then you have a Social Security number as well. OK, and what is your Social Security number please? Sorry, what was that, a 25 or a 225? You cut out for a bit. Alright, thank you so much. And could I have your e-mail address please? Great. Uh That is the last question. So let me take your information and I'll be able to get you signed up right away. Thank you for calling Contoso and I'll be able to get you signed up immediately. One of our agents will call you back in about 24 hours or so to confirm your application. Absolutely. If you need anything else, please give us a call at 1-800-555-5564, extension 123. Thank you very much for calling Contoso. Uh Yes, of course. So the default is a digital membership card, but we can send you a physical card if you prefer. Uh, yeah. Absolutely. I've made a note on your file. You're very welcome. Thank you for calling Contoso and have a great day."
		},
		{
			"channel": 1,
			"text": "Hi, my name is Mary Rondo. I'm trying to enroll myself with Contuso. Yes, yeah, I'm calling to sign up for insurance. Okay. So Mary Beth Rondo, last name is R like Romeo, O like Ocean, N like Nancy D, D like Dog, and O like Ocean again. Rondo. I only have a cell phone so I can give you that. Sure, so it's 234-554 and then 9312. Yep, that's right. Uh Yes, I am self-employed. Yes, I do. Uh Sure, so it's 412256789. It's double two, so 412, then another two, then five. Yeah, it's maryrondo@gmail.com. So my first and last name at gmail.com. No periods, no dashes. That was quick. Thank you. Actually, so I have one more question. I'm curious, will I be getting a physical card as proof of coverage? uh Yes. Could you please mail it to me when it's ready? I'd like to have it shipped to, are you ready for my address? So it's 2660 Unit A on Maple Avenue SE, Lansing, and then zip code is 48823. Awesome. Thanks so much."
		}
	],
	"phrases": [
		{
			"channel": 0,
			"offsetMilliseconds": 720,
			"durationMilliseconds": 480,
			"text": "Hello.",
			"words": [
				{
					"text": "Hello.",
					"offsetMilliseconds": 720,
					"durationMilliseconds": 480
				}
			],
			"locale": "en-US",
			"confidence": 0.9177142
		},
		{
			"channel": 0,
			"offsetMilliseconds": 1200,
			"durationMilliseconds": 1120,
			"text": "Thank you for calling Contoso.",
			"words": [
				{
					"text": "Thank",
					"offsetMilliseconds": 1200,
					"durationMilliseconds": 200
				},
				{
					"text": "you",
					"offsetMilliseconds": 1400,
					"durationMilliseconds": 80
				},
				{
					"text": "for",
					"offsetMilliseconds": 1480,
					"durationMilliseconds": 120
				},
				{
					"text": "calling",
					"offsetMilliseconds": 1600,
					"durationMilliseconds": 240
				},
				{
					"text": "Contoso.",
					"offsetMilliseconds": 1840,
					"durationMilliseconds": 480
				}
			],
			"locale": "en-US",
			"confidence": 0.9177142
		},
		{
			"channel": 0,
			"offsetMilliseconds": 2320,
			"durationMilliseconds": 1120,
			"text": "Who am I speaking with today?",
			"words": [
				{
					"text": "Who",
					"offsetMilliseconds": 2320,
					"durationMilliseconds": 160
				},
				{
					"text": "am",
					"offsetMilliseconds": 2480,
					"durationMilliseconds": 80
				},
				{
					"text": "I",
					"offsetMilliseconds": 2560,
					"durationMilliseconds": 80
				},
				{
					"text": "speaking",
					"offsetMilliseconds": 2640,
					"durationMilliseconds": 320
				},
				{
					"text": "with",
					"offsetMilliseconds": 2960,
					"durationMilliseconds": 160
				},
				{
					"text": "today?",
					"offsetMilliseconds": 3120,
					"durationMilliseconds": 320
				}
			],
			"locale": "en-US",
			"confidence": 0.9177142
		},
		{
			"channel": 0,
			"offsetMilliseconds": 9520,
			"durationMilliseconds": 400,
			"text": "Hi, Mary.",
			"words": [
				{
					"text": "Hi,",
					"offsetMilliseconds": 9520,
					"durationMilliseconds": 80
				},
				{
					"text": "Mary.",
					"offsetMilliseconds": 9600,
					"durationMilliseconds": 320
				}
			],
			"locale": "en-US",
			"confidence": 0.9177142
		},
		// More transcription results...
	    // Redacted for brevity
		{
			"channel": 1,
			"offsetMilliseconds": 4480,
			"durationMilliseconds": 1600,
			"text": "Hi, my name is Mary Rondo.",
			"words": [
				{
					"text": "Hi,",
					"offsetMilliseconds": 4480,
					"durationMilliseconds": 400
				},
				{
					"text": "my",
					"offsetMilliseconds": 4880,
					"durationMilliseconds": 120
				},
				{
					"text": "name",
					"offsetMilliseconds": 5000,
					"durationMilliseconds": 120
				},
				{
					"text": "is",
					"offsetMilliseconds": 5120,
					"durationMilliseconds": 160
				},
				{
					"text": "Mary",
					"offsetMilliseconds": 5280,
					"durationMilliseconds": 240
				},
				{
					"text": "Rondo.",
					"offsetMilliseconds": 5520,
					"durationMilliseconds": 560
				}
			],
			"locale": "en-US",
			"confidence": 0.8989456
		},
		{
			"channel": 1,
			"offsetMilliseconds": 6080,
			"durationMilliseconds": 1920,
			"text": "I'm trying to enroll myself with Contuso.",
			"words": [
				{
					"text": "I'm",
					"offsetMilliseconds": 6080,
					"durationMilliseconds": 160
				},
				{
					"text": "trying",
					"offsetMilliseconds": 6240,
					"durationMilliseconds": 200
				},
				{
					"text": "to",
					"offsetMilliseconds": 6440,
					"durationMilliseconds": 80
				},
				{
					"text": "enroll",
					"offsetMilliseconds": 6520,
					"durationMilliseconds": 200
				},
				{
					"text": "myself",
					"offsetMilliseconds": 6720,
					"durationMilliseconds": 360
				},
				{
					"text": "with",
					"offsetMilliseconds": 7080,
					"durationMilliseconds": 120
				},
				{
					"text": "Contuso.",
					"offsetMilliseconds": 7200,
					"durationMilliseconds": 800
				}
			],
			"locale": "en-US",
			"confidence": 0.8989456
		},
		// More transcription results...
	    // Redacted for brevity
    ]
}

Možnosti konfigurace požadavku

Tady jsou některé možnosti vlastností, které můžete nakonfigurovat přepis při volání operace Přepisy – Přepis .

Vlastnost	Popis	Požadované nebo volitelné
`channels`	Seznam indexů založených na nule kanálů, které se mají přepisovat samostatně. Pokud není povolená diarizace, podporují se až dva kanály. Ve výchozím nastavení rozhraní API pro rychlé přepisy sloučí všechny vstupní kanály do jednoho kanálu a pak provede přepis. Pokud to není žádoucí, kanály je možné přepisovat nezávisle bez sloučení. Chcete-li přepsat kanály ze stereo zvukového souboru samostatně, je třeba zadat `[0,1]`, `[0]`nebo `[1]`. V opačném případě se stereo zvuk sloučí do monofonu a přepisuje se jenom jeden kanál. Pokud je zvuk stereo a diarizace je povolena, nemůžete vlastnost nastavit `channels` na `[0,1]`. Služba Speech nepodporuje diarizaci více kanálů. U monofonního zvuku `channels` je vlastnost ignorována a zvuk se vždy přepisuje jako jeden kanál.	Volitelné
`diarization`	Konfigurace diarizace. Diarizace je proces rozpoznávání a oddělení více reproduktorů v jednom zvukovém kanálu. Zadejte například `"diarization": {"maxSpeakers": 2, "enabled": true}`. Soubor přepisu pak obsahuje `speaker` položky (například `"speaker": 0` ) `"speaker": 1`pro každou přepisovanou frázi.	Volitelné
`locales`	Seznam národních prostředí, která by měla odpovídat očekávanému národnímu prostředí zvukových dat k přepisu. Pokud znáte národní prostředí zvukového souboru, můžete ho určit, aby se zlepšila přesnost přepisu a minimalizovala se latence. Pokud je zadáno jedno národní prostředí, použije se k přepisu toto národní prostředí. Pokud si ale nejste jistí národním prostředím, můžete zadat více národních prostředí. Identifikace jazyka může být přesnější s přesnějším seznamem kandidátských národních prostředí. Pokud nezadáte žádné národní prostředí nebo pokud zadaná národní prostředí nejsou ve zvukovém souboru, služba Speech se stále snaží identifikovat jazyk. Pokud jazyk nejde identifikovat, vrátí se chyba. Podporovaná národní prostředí, která můžete zadat, jsou: de-DE, en-IN, en-US, es-ES, es-MX, fr-FR, hi-IN, it-IT, ja-JP, ko-KR, pt-BR a zh-CN. Nejnovější podporované jazyky můžete získat prostřednictvím přepisů – seznam podporovaných národních prostředí REST API. Další informace o národních prostředích najdete v dokumentaci podpory jazyka služby Speech.	Volitelné, ale doporučuje se, pokud znáte očekávané národní prostředí.
`profanityFilterMode`	Určuje způsob zpracování vulgárních výrazů ve výsledcích rozpoznávání. Přijaté hodnoty jsou `None` zakázání filtrování vulgárních výrazů, `Masked` nahrazení vulgárních výrazů hvězdičkami, `Removed` odebrání všech vulgárních výrazů z výsledku nebo `Tags` přidání značek vulgárních výrazů. Výchozí hodnota je `Masked`.	Volitelné

Sdílet prostřednictvím

Použití rozhraní API pro rychlý přepis s využitím služby Azure AI Speech

Požadavky

Použití rozhraní API pro rychlý přepis

Možnosti konfigurace požadavku

Váš názor

Další materiály

Sdílet prostřednictvím

Použití rozhraní API pro rychlý přepis s využitím služby Azure AI Speech

Požadavky

Použití rozhraní API pro rychlý přepis

Možnosti konfigurace požadavku

Související obsah

Váš názor

Další materiály