Bagikan melalui


Tutorial: Penerjemah dengan layanan Azure AI

Penerjemah adalah layanan Azure AI yang memungkinkan Anda melakukan terjemahan bahasa dan operasi terkait bahasa lainnya. Dalam tutorial ini, Anda akan belajar cara menggunakan Penerjemah untuk membangun solusi multi-bahasa yang cerdas di Azure Synapse Analytics.

Tutorial ini menunjukkan menggunakan penerjemah dengan MMLSpark untuk:

  • Terjemahkan teks
  • Teks transliterasi
  • Mendeteksi bahasa
  • Pecahkan kalimat
  • Pencarian kamus
  • Contoh kamus

Jika Anda tidak memiliki langganan Azure, buat akun gratis sebelum Anda memulai.

Prasyarat

Memulai

Buka Synapse Studio dan buat notebook baru. Untuk memulai, impor MMLSpark.

import mmlspark
from mmlspark.cognitive import *
from notebookutils import mssparkutils
from pyspark.sql.functions import col, flatten

Mengonfigurasi penerjemah

Gunakan form recognizer tertaut yang Anda konfigurasikan dalam langkah-langkah prakonfigurasi .

ai_service_name = "<Your linked service for translator>"

Menerjemahkan Teks

Operasi inti dari layanan Penerjemah adalah menerjemahkan teks.

df = spark.createDataFrame([
  (["Hello, what is your name?", "Bye"],)
], ["text",])

translate = (Translate()
    .setLinkedService(ai_service_name)
    .setTextCol("text")
    .setToLanguage(["zh-Hans", "fr"])
    .setOutputCol("translation")
    .setConcurrency(5))

display(translate
      .transform(df)
      .withColumn("translation", flatten(col("translation.translations")))
      .withColumn("translation", col("translation.text"))
      .select("translation"))

Hasil yang diharapkan

["你好,你叫什么名字?","Bonjour, quel est votre nom?","再见","Au revoir"]

Teks Transliterasi

Transliterasi adalah proses mengonversi kata atau frasa dari skrip (alfabet) dari satu bahasa ke bahasa lain berdasarkan kesamaan fonetis.

transliterateDf =  spark.createDataFrame([
  (["こんにちは", "さようなら"],)
], ["text",])

transliterate = (Transliterate()
    .setLinkedService(ai_service_name)
    .setLanguage("ja")
    .setFromScript("Jpan")
    .setToScript("Latn")
    .setTextCol("text")
    .setOutputCol("result"))

display(transliterate
    .transform(transliterateDf)
    .withColumn("text", col("result.text"))
    .withColumn("script", col("result.script"))
    .select("text", "script"))

Hasil yang diharapkan

text skrip
"["Kon'nichiwa","sayonara"]" "["Latn","Latn"]"

Deteksi bahasa

Jika Anda tahu bahwa Anda memerlukan terjemahan, tetapi tidak tahu bahasa teks yang akan dikirim ke layanan Penerjemah, Anda dapat menggunakan operasi deteksi bahasa.

detectDf =  spark.createDataFrame([
  (["Hello, what is your name?"],)
], ["text",])

detect = (Detect()
    .setLinkedService(ai_service_name)
    .setTextCol("text")
    .setOutputCol("result"))

display(detect
    .transform(detectDf)
    .withColumn("language", col("result.language"))
    .select("language"))

Hasil yang diharapkan

"["en"]"

Pecahkan Kalimat

Mengidentifikasi posisi batas kalimat dalam sepotong teks.

bsDf =  spark.createDataFrame([
  (["Hello, what is your name?"],)
], ["text",])

breakSentence = (BreakSentence()
    .setLinkedService(ai_service_name)
    .setTextCol("text")
    .setOutputCol("result"))

display(breakSentence
    .transform(bsDf)
    .withColumn("sentLen", flatten(col("result.sentLen")))
    .select("sentLen"))

Hasil yang diharapkan

"[25]"

Pencarian kamus (terjemahan alternatif)

Dengan titik akhir, Anda bisa mendapatkan terjemahan alternatif untuk kata atau frasa.

dictDf = spark.createDataFrame([
  (["fly"],)
], ["text",])

dictionaryLookup = (DictionaryLookup()
    .setLinkedService(ai_service_name)
    .setFromLanguage("en")
    .setToLanguage("es")
    .setTextCol("text")
    .setOutputCol("result"))

display(dictionaryLookup
    .transform(dictDf)
    .withColumn("translations", flatten(col("result.translations")))
    .withColumn("normalizedTarget", col("translations.normalizedTarget"))
    .select("normalizedTarget"))

Hasil yang diharapkan

normalizedTarget
"["volar","mosca","operan","pilotar","moscas","marcha"]"

Contoh kamus (terjemahan dalam konteks)

Setelah melakukan pencarian kamus, Anda dapat meneruskan teks sumber dan terjemahan ke titik akhir kamus /contoh untuk mendapatkan daftar contoh yang menampilkan kedua istilah dalam konteks kalimat atau frasa.

dictDf = spark.createDataFrame([
  ([("fly", "volar")],)
], ["textAndTranslation",])

dictionaryExamples = (DictionaryExamples()
    .setLinkedService(ai_service_name)
    .setFromLanguage("en")
    .setToLanguage("es")
    .setTextAndTranslationCol("textAndTranslation")
    .setOutputCol("result"))

display(dictionaryExamples
    .transform(dictDf)
    .withColumn("examples", flatten(col("result.examples")))
    .select("examples"))

Hasil yang diharapkan


[{"sourcePrefix":"I mean, for a guy who could ","sourceSuffix":".","targetPrefix":"Quiero decir, para un tipo que podía ","targetTerm":"volar","sourceTerm":"fly","targetSuffix":"."},{"sourcePrefix":"Now it's time to make you ","sourceSuffix":".","targetPrefix":"Ahora es hora de que te haga ","targetTerm":"volar","sourceTerm":"fly","targetSuffix":"."},{"sourcePrefix":"One happy thought will make you ","sourceSuffix":".","targetPrefix":"Uno solo te hará ","targetTerm":"volar","sourceTerm":"fly","targetSuffix":"."},{"sourcePrefix":"They need machines to ","sourceSuffix":".","targetPrefix":"Necesitan máquinas para ","targetTerm":"volar","sourceTerm":"fly","targetSuffix":"."},{"sourcePrefix":"That should really ","sourceSuffix":".","targetPrefix":"Eso realmente debe ","targetTerm":"volar","sourceTerm":"fly","targetSuffix":"."},{"sourcePrefix":"It sure takes longer when you can't ","sourceSuffix":".","targetPrefix":"Lleva más tiempo cuando no puedes ","targetTerm":"volar","sourceTerm":"fly","targetSuffix":"."},{"sourcePrefix":"I have to ","sourceSuffix":" home in the morning.","targetPrefix":"Tengo que ","targetTerm":"volar","sourceTerm":"fly","targetSuffix":" a casa por la mañana."},{"sourcePrefix":"You taught me to ","sourceSuffix":".","targetPrefix":"Me enseñaste a ","targetTerm":"volar","sourceTerm":"fly","targetSuffix":"."},{"sourcePrefix":"I think you should ","sourceSuffix":" with the window closed.","targetPrefix":"Creo que debemos ","targetTerm":"volar","sourceTerm":"fly","targetSuffix":" con la ventana cerrada."},{"sourcePrefix":"They look like they could ","sourceSuffix":".","targetPrefix":"Parece que pudieran ","targetTerm":"volar","sourceTerm":"fly","targetSuffix":"."},{"sourcePrefix":"But you can ","sourceSuffix":", for instance?","targetPrefix":"Que puedes ","targetTerm":"volar","sourceTerm":"fly","targetSuffix":", por ejemplo."},{"sourcePrefix":"At least until her kids can be able to ","sourceSuffix":".","targetPrefix":"Al menos hasta que sus hijos sean capaces de ","targetTerm":"volar","sourceTerm":"fly","targetSuffix":"."},{"sourcePrefix":"I thought you could ","sourceSuffix":".","targetPrefix":"Pensé que podías ","targetTerm":"volar","sourceTerm":"fly","targetSuffix":"."},{"sourcePrefix":"I was wondering what it would be like to ","sourceSuffix":".","targetPrefix":"Me preguntaba cómo sería ","targetTerm":"volar","sourceTerm":"fly","targetSuffix":"."},{"sourcePrefix":"But nobody else can ","sourceSuffix":".","targetPrefix":"Pero nadie puede ","targetTerm":"volar","sourceTerm":"fly","targetSuffix":"."}]

Membersihkan sumber daya

Untuk memastikan instans Spark dimatikan, akhiri semua sesi yang tersambung (notebook). Kumpulan dimatikan ketika waktu siaga yang ditentukan di kumpulan Apache Spark tercapai. Anda juga dapat memilih hentikan sesi dari bilah status di kanan atas buku catatan.

screenshot-showing-stop-session

Langkah berikutnya