Sentiment Analysis and Opinion Mining language support

Artikkel
08/28/2024

Use this article to learn which languages are supported by Sentiment Analysis and Opinion Mining. Both the cloud-based API and Docker containers support the same languages.

Sentiment Analysis language support

Total supported language codes: 94

Language	Language code	Notes
Afrikaans	`af`
Albanian	`sq`
Amharic	`am`
Arabic	`ar`
Armenian	`hy`
Assamese	`as`
Azerbaijani	`az`
Basque	`eu`
Belarusian (new)	`be`
Bengali	`bn`
Bosnian	`bs`
Breton (new)	`br`
Bulgarian	`bg`
Burmese	`my`
Catalan	`ca`
Chinese (Simplified)	`zh-hans`	`zh` also accepted
Chinese (Traditional)	`zh-hant`
Croatian	`hr`
Czech	`cs`
Danish	`da`
Dutch	`nl`
English	`en`
Esperanto (new)	`eo`
Estonian	`et`
Filipino	`fil`
Finnish	`fi`
French	`fr`
Galician	`gl`
Georgian	`ka`
German	`de`
Greek	`el`
Gujarati	`gu`
Hausa (new)	`ha`
Hebrew	`he`
Hindi	`hi`
Hungarian	`hu`
Indonesian	`id`
Irish	`ga`
Italian	`it`
Japanese	`ja`
Javanese (new)	`jv`
Kannada	`kn`
Kazakh	`kk`
Khmer	`km`
Korean	`ko`
Kurdish (Kurmanji)	`ku`
Kyrgyz	`ky`
Lao	`lo`
Latin (new)	`la`
Latvian	`lv`
Lithuanian	`lt`
Macedonian	`mk`
Malagasy	`mg`
Malay	`ms`
Malayalam	`ml`
Marathi	`mr`
Mongolian	`mn`
Nepali	`ne`
Norwegian	`no`
Odia	`or`
Oromo (new)	`om`
Pashto	`ps`
Persian	`fa`
Polish	`pl`
Portuguese (Portugal)	`pt-PT`	`pt` also accepted
Portuguese (Brazil)	`pt-BR`
Punjabi	`pa`
Romanian	`ro`
Russian	`ru`
Sanskrit (new)	`sa`
Scottish Gaelic (new)	`gd`
Serbian	`sr`
Sindhi (new)	`sd`
Sinhala (new)	`si`
Slovak	`sk`
Slovenian	`sl`
Somali	`so`
Spanish	`es`
Sundanese (new)	`su`
Swahili	`sw`
Swedish	`sv`
Tamil	`ta`
Telugu	`te`
Thai	`th`
Turkish	`tr`
Ukrainian	`uk`
Urdu	`ur`
Uyghur	`ug`
Uzbek	`uz`
Vietnamese	`vi`
Welsh	`cy`
Western Frisian (new)	`fy`
Xhosa (new)	`xh`
Yiddish (new)	`yi`

Opinion Mining language support

Total supported language codes: 94

Language	Language code	Notes
Afrikaans (new)	`af`
Albanian (new)	`sq`
Amharic (new)	`am`
Arabic	`ar`
Armenian (new)	`hy`
Assamese (new)	`as`
Azerbaijani (new)	`az`
Basque (new)	`eu`
Belarusian (new)	`be`
Bengali	`bn`
Bosnian (new)	`bs`
Breton (new)	`br`
Bulgarian (new)	`bg`
Burmese (new)	`my`
Catalan (new)	`ca`
Chinese (Simplified)	`zh-hans`	`zh` also accepted
Chinese (Traditional) (new)	`zh-hant`
Croatian (new)	`hr`
Czech (new)	`cs`
Danish	`da`
Dutch	`nl`
English	`en`
Esperanto (new)	`eo`
Estonian (new)	`et`
Filipino (new)	`fil`
Finnish	`fi`
French	`fr`
Galician (new)	`gl`
Georgian (new)	`ka`
German	`de`
Greek	`el`
Gujarati (new)	`gu`
Hausa (new)	`ha`
Hebrew (new)	`he`
Hindi	`hi`
Hungarian	`hu`
Indonesian	`id`
Irish (new)	`ga`
Italian	`it`
Japanese	`ja`
Javanese (new)	`jv`
Kannada (new)	`kn`
Kazakh (new)	`kk`
Khmer (new)	`km`
Korean	`ko`
Kurdish (Kurmanji)	`ku`
Kyrgyz (new)	`ky`
Lao (new)	`lo`
Latin (new)	`la`
Latvian (new)	`lv`
Lithuanian (new)	`lt`
Macedonian (new)	`mk`
Malagasy (new)	`mg`
Malay (new)	`ms`
Malayalam (new)	`ml`
Marathi	`mr`
Mongolian (new)	`mn`
Nepali (new)	`ne`
Norwegian	`no`
Odia (new)	`or`
Oromo (new)	`om`
Pashto (new)	`ps`
Persian (new)	`fa`
Polish	`pl`
Portuguese (Portugal)	`pt-PT`	`pt` also accepted
Portuguese (Brazil)	`pt-BR`
Punjabi (new)	`pa`
Romanian (new)	`ro`
Russian	`ru`
Sanskrit (new)	`sa`
Scottish Gaelic (new)	`gd`
Serbian (new)	`sr`
Sindhi (new)	`sd`
Sinhala (new)	`si`
Slovak (new)	`sk`
Slovenian (new)	`sl`
Somali (new)	`so`
Spanish	`es`
Sundanese (new)	`su`
Swahili (new)	`sw`
Swedish	`sv`
Tamil	`ta`
Telugu	`te`
Thai (new)	`th`
Turkish	`tr`
Ukrainian (new)	`uk`
Urdu (new)	`ur`
Uyghur (new)	`ug`
Uzbek (new)	`uz`
Vietnamese (new)	`vi`
Welsh (new)	`cy`
Western Frisian (new)	`fy`
Xhosa (new)	`xh`
Yiddish (new)	`yi`

Multi-lingual option (Custom sentiment analysis only)

With Custom sentiment analysis, you can train a model in one language and use to classify documents in another language. This feature is useful because it helps save time and effort. Instead of building separate projects for every language, you can handle multi-lingual dataset in one project. Your dataset doesn't have to be entirely in the same language but you should enable the multi-lingual option for your project while creating or later in project settings. If you notice your model performing poorly in certain languages during the evaluation process, consider adding more data in these languages to your training set.

You can train your project entirely with English documents, and query it in: French, German, Mandarin, Japanese, Korean, and others. Custom sentiment analysis makes it easy for you to scale your projects to multiple languages by using multilingual technology to train your models.

Whenever you identify that a particular language is not performing as well as other languages, you can add more documents for that language in your project.

You aren't expected to add the same number of documents for every language. You should build the majority of your project in one language, and only add a few documents in languages you observe aren't performing well. If you create a project that is primarily in English, and start testing it in French, German, and Spanish, you might observe that German doesn't perform as well as the other two languages. In that case, consider adding 5% of your original English documents in German, train a new model and test in German again. You should see better results for German queries. The more labeled documents you add, the more likely the results are going to get better.

When you add data in another language, you shouldn't expect it to negatively affect other languages.

Next steps

how to call the API for more information.
Quickstart: Use the Sentiment Analysis client library and REST API

Del via

Sentiment Analysis and Opinion Mining language support

Sentiment Analysis language support

Opinion Mining language support

Multi-lingual option (Custom sentiment analysis only)

Next steps

Tilbakemeldinger

Flere ressurser