Edit

Share via


Sentiment Analysis and Opinion Mining language support

Use this article to learn which languages are supported by Sentiment Analysis and Opinion Mining. Both the cloud-based API and Docker containers support the same languages.

Sentiment Analysis language support

Total supported language codes: 94

Language Language code Notes
Afrikaans af
Albanian sq
Amharic am
Arabic ar
Armenian hy
Assamese as
Azerbaijani az
Basque eu
Belarusian (new) be
Bengali bn
Bosnian bs
Breton (new) br
Bulgarian bg
Burmese my
Catalan ca
Chinese (Simplified) zh-hans zh also accepted
Chinese (Traditional) zh-hant
Croatian hr
Czech cs
Danish da
Dutch nl
English en
Esperanto (new) eo
Estonian et
Filipino fil
Finnish fi
French fr
Galician gl
Georgian ka
German de
Greek el
Gujarati gu
Hausa (new) ha
Hebrew he
Hindi hi
Hungarian hu
Indonesian id
Irish ga
Italian it
Japanese ja
Javanese (new) jv
Kannada kn
Kazakh kk
Khmer km
Korean ko
Kurdish (Kurmanji) ku
Kyrgyz ky
Lao lo
Latin (new) la
Latvian lv
Lithuanian lt
Macedonian mk
Malagasy mg
Malay ms
Malayalam ml
Marathi mr
Mongolian mn
Nepali ne
Norwegian no
Odia or
Oromo (new) om
Pashto ps
Persian fa
Polish pl
Portuguese (Portugal) pt-PT pt also accepted
Portuguese (Brazil) pt-BR
Punjabi pa
Romanian ro
Russian ru
Sanskrit (new) sa
Scottish Gaelic (new) gd
Serbian sr
Sindhi (new) sd
Sinhala (new) si
Slovak sk
Slovenian sl
Somali so
Spanish es
Sundanese (new) su
Swahili sw
Swedish sv
Tamil ta
Telugu te
Thai th
Turkish tr
Ukrainian uk
Urdu ur
Uyghur ug
Uzbek uz
Vietnamese vi
Welsh cy
Western Frisian (new) fy
Xhosa (new) xh
Yiddish (new) yi

Opinion Mining language support

Total supported language codes: 94

Language Language code Notes
Afrikaans (new) af
Albanian (new) sq
Amharic (new) am
Arabic ar
Armenian (new) hy
Assamese (new) as
Azerbaijani (new) az
Basque (new) eu
Belarusian (new) be
Bengali bn
Bosnian (new) bs
Breton (new) br
Bulgarian (new) bg
Burmese (new) my
Catalan (new) ca
Chinese (Simplified) zh-hans zh also accepted
Chinese (Traditional) (new) zh-hant
Croatian (new) hr
Czech (new) cs
Danish da
Dutch nl
English en
Esperanto (new) eo
Estonian (new) et
Filipino (new) fil
Finnish fi
French fr
Galician (new) gl
Georgian (new) ka
German de
Greek el
Gujarati (new) gu
Hausa (new) ha
Hebrew (new) he
Hindi hi
Hungarian hu
Indonesian id
Irish (new) ga
Italian it
Japanese ja
Javanese (new) jv
Kannada (new) kn
Kazakh (new) kk
Khmer (new) km
Korean ko
Kurdish (Kurmanji) ku
Kyrgyz (new) ky
Lao (new) lo
Latin (new) la
Latvian (new) lv
Lithuanian (new) lt
Macedonian (new) mk
Malagasy (new) mg
Malay (new) ms
Malayalam (new) ml
Marathi mr
Mongolian (new) mn
Nepali (new) ne
Norwegian no
Odia (new) or
Oromo (new) om
Pashto (new) ps
Persian (new) fa
Polish pl
Portuguese (Portugal) pt-PT pt also accepted
Portuguese (Brazil) pt-BR
Punjabi (new) pa
Romanian (new) ro
Russian ru
Sanskrit (new) sa
Scottish Gaelic (new) gd
Serbian (new) sr
Sindhi (new) sd
Sinhala (new) si
Slovak (new) sk
Slovenian (new) sl
Somali (new) so
Spanish es
Sundanese (new) su
Swahili (new) sw
Swedish sv
Tamil ta
Telugu te
Thai (new) th
Turkish tr
Ukrainian (new) uk
Urdu (new) ur
Uyghur (new) ug
Uzbek (new) uz
Vietnamese (new) vi
Welsh (new) cy
Western Frisian (new) fy
Xhosa (new) xh
Yiddish (new) yi

Multi-lingual option (Custom sentiment analysis only)

With Custom sentiment analysis, you can train a model in one language and use to classify documents in another language. This feature is useful because it helps save time and effort. Instead of building separate projects for every language, you can handle multi-lingual dataset in one project. Your dataset doesn't have to be entirely in the same language but you should enable the multi-lingual option for your project while creating or later in project settings. If you notice your model performing poorly in certain languages during the evaluation process, consider adding more data in these languages to your training set.

You can train your project entirely with English documents, and query it in: French, German, Mandarin, Japanese, Korean, and others. Custom sentiment analysis makes it easy for you to scale your projects to multiple languages by using multilingual technology to train your models.

Whenever you identify that a particular language is not performing as well as other languages, you can add more documents for that language in your project.

You aren't expected to add the same number of documents for every language. You should build the majority of your project in one language, and only add a few documents in languages you observe aren't performing well. If you create a project that is primarily in English, and start testing it in French, German, and Spanish, you might observe that German doesn't perform as well as the other two languages. In that case, consider adding 5% of your original English documents in German, train a new model and test in German again. You should see better results for German queries. The more labeled documents you add, the more likely the results are going to get better.

When you add data in another language, you shouldn't expect it to negatively affect other languages.

Next steps