Structured text phonetic pronunciation data

You can specify the phonetic pronunciation of words using the Universal Phone Set (UPS) in a structured text data file. The UPS is a machine-readable phone set that is based on the International Phonetic Set Alphabet (IPA). The IPA is a standard used by linguists world-wide.

UPS pronunciations consist of a string of UPS phonemes, each separated by whitespace. UPS phoneme labels are all defined using ASCII character strings.

For steps on implementing UPS, see Structured text phonetic pronunciation. Structured text phonetic pronunciation data is separate from pronunciation data, and they cannot be used together. The first one is "sounds-like" or spoken-form data, and is input as a separate file, and trains the model what the spoken form sounds like

Structured text phonetic pronunciation data is specified per syllable in a markdown file. Separately, pronunciation data it input on its own, and trains the model what the spoken form sounds like. You can either use a pronunciation data file on its own, or you can add pronunciation within a structured text data file. The Speech service doesn't support training a model with both of those datasets as input.

See the sections in this article for the Universal Phone Set for each locale.

en-US

Consonants

UPS Phonemes IPA Example
B b big
CH t.ʃ / ʧ chin
D d dig
DH ð then
F f fork
G g gut
H h help
JH d.ʒ / ʤ joy
K k cut
L l lid
M m mat
N n no
NG ŋ sing
P p put
R ɻ red
S s sit
SH ʃ she
T t talk
TH θ thin
V v vat
W w with
J j yard
Z z zap
ZH ʒ pleasure

Vowels

UPS Phonemes IPA Example
AA ɑ father
AE æ cat
AH ʌ cut
AO ɔ dog
AOX ɔ.ə four
AU ɑ.ʊ foul
AX ə ago
AX R ɚ minor
AI ɑ.ɪ bite
EH ɛ pet
EHX ɛ.ə stairs
ER R ɝ urban
EI e.ɪ ate
IH ɪ fill
I i feel
O o go
OI ɔ.ɪ toy
OWX o.ə boa
Q ɒ hot
UH ʊ book
U u too, blue
UWX u.ə lure

Next steps