Developing OpenType Fonts for Korean Hangul Script

Article
06/16/2022

This document presents information that will help font developers create or support OpenType fonts for the Korean Hangul script covered by the Unicode Standard.

Introduction

The Korean Hangul script is a 'syllabic' script. The syllables are formed by combining sequences of elemental, alphabetic consonants and vowels. The process of composing syllables is additive in nature and follows a set of predefined rules. The glyph elements of each composed syllable are shaped and positioned into a square display cell, often referred to as a 'syllable block', or 'syllable glyph'.

The Unicode Standard provides encodings for pre-composed Hangul syllables known as 'Modern Hangul', as well as encodings for individual Hangul alphabetic elements, called 'Jamo' and known as 'Old Hangul'. 'Modern Hangul' has 11,172 pre-composed characters in the Unicode range U+AC00 through U+D7AF. 'Old Hangul' syllables can be composed from the individual Hangul Jamos encoded in the Unicode Hangul Jamo block (U+1100 through U+11FF). More specifically, only certain sequences of these Jamo characters can combine to form Old Hangul syllables. These sequences are defined in Appendix B. Only sequences defined in Appendix B will result in formation of Old Hangul syllables. Sequences of character codes from the Hangul Jamo Block that do not match any sequence pattern in Appendix B, will be considered as a sequence of individual non-Old Hangul characters.

In this specification, font developers will learn how to address Old Hangul syllable formation, encode complex script features in their fonts, choose character sets, organize font information, and use existing tools to produce Old Hangul fonts. Registered features of the Korean Hangul script are defined and illustrated, encodings are listed, and templates are included for compiling Korean Hangul layout tables for OpenType fonts.

This document also presents information about the Korean OpenType shaping engine of Uniscribe, the Windows component responsible for text layout.

In addition to being a primer and specification for the creation and support of Hangul fonts, this document is intended to more broadly illustrate the OpenType Layout architecture, feature schemes, and operating system support for shaping and positioning text.

Glossary

The following terms are useful for understanding the layout features and script rules discussed in this document.

Jamo - Individual Hangul alphabetic elements or atomic unit in a syllable. Consonants and vowels are both known as Jamos.

Consonant - Represents a single consonant sound. Consonants are further divided into leading consonants and trailing consonants.

Leading consonant (Leading Jamo) "Choseong" - the syllable initial character
Trailing Consonant (Trailing Jamo) "Jongseong" - the syllable final character

Vowel (Vowel Jamo) "Jungseong" - A phoneme; an independent unit in a syllable. It does not combine with any consonant to result in the transformation of any consonant-vowel combination.

Notation

The following notation is used in this document to illustrate layout operations:

L – Leading consonant

V – Vowel

T – Trailing consonant

S – Syllable

X – Non-Jamo character

{ } – Indicates 0, 1 or multiple occurrence

[ ] – Indicates 0 or 1 occurrence

() – Indicates 1 or multiple occurrence

The Uniscribe Korean shaping engine processes text in stages. The stages are:

Compose Old Hangul Jamo combinations
Identify syllable boundaries with OTLS
Analyze the syllables
Shape glyphs with OTLS (OpenType Library Services)

The descriptions which follow will help font developers understand the rationale for the Korean Hangul feature encoding model, and help application developers better understand how layout clients can divide responsibilities with operating system functions.

Compose Old Hangul Jamo combinations

The shaping engine receives a sequence of characters (character run), which have been identified into sequences of leading consonant (L), vowel (V) and trailing consonant (T) Jamos. In each of these sequences, the shaping engine identifies the maximum length of characters which can combine to form registered Jamos. This is done according to the list of standard character combinations in Appendix B.

Next, it replaces these with the corresponding old Hangul Jamo. This process is repeated on the next longest string in the sequence. This process of identification and replacement is repeated for all sequences.

The result of this process is a string of registered Old Hangul Jamos like the example below:

V1L1L2L3V2V3T1T2T3L4L5V4T4V5V6L6V7
---> V1L1(L2L3)V2V3(T1T2T3)L4L5V4T4(V5V6)L6V7
---> V1L1(L23)V2V3(T123)L4L5V4T4(V56)L6V7

Analyze the Syllables

The syllable unit that the shaping engine receives for the purpose of shaping is a string of Unicode characters, in a sequence. Since each Hangul syllable has the canonical format of LVT, fillers Lf and Vf, are then added, where required, in the registered Jamo sequence to convert each of them to canonical form. The shaping engine then flags each of these for appropriate feature processing. OTLS will then be called to perform OpenType layout processing for each syllable in turn.

It is important to note that if any of the Jamo sequences being analyzed is capable of forming a Modern Hangul Syllable, the shaping engine does not apply OpenType features to shape them. Composition of Modern Hangul syllables is expected to be done using the pre-composed section (U+AC00 – U+D7AF), as described in the Unicode Standard.

Shaping with OTLS

The first step Uniscribe takes in shaping the character string is to map all characters to their nominal form glyphs.

Next, Uniscribe calls the OTL Services Library to shape the Old Hangul syllable. All OTL processing is divided into a set of predefined features (described and illustrated in the Features section of this document). Each feature is applied, one by one, to the appropriate glyphs in the syllable and OTLS processes them. Uniscribe makes as many calls to the OTL Services as there are features. This ensures that the features are executed in the desired order.

The steps of the shaping process are outlined below.

Shaping features:

Language forms
1. Apply feature 'ccmp' to preprocess any glyphs that require composition
2. Apply feature 'ljmo' to get the leading consonant Jamo
3. Apply feature 'vjmo' to get the vowel Jamo
4. Apply feature 'tjmo' to get the trailing consonant Jamo

Handling Invalid Combining Marks

Combining marks and signs that appear in text not in conjunction with a valid consonant base are considered invalid. When an invalid combination of letters is encountered, Uniscribe simply starts a new syllable/cluster.

Please note that to render a sign standalone (in apparent isolation from any base) one should apply it on a space (see section 2.5 'Combining Marks' of Unicode Standard 3.1). Uniscribe requires a ZWJ to be placed between the space and a mark for them to combine into a standalone sign.

Illustration that shows the Unicode characters zero width non joiner and zero width joiner with suggested glyphs.

While not required for OpenType functionality, inclusion of the ZWJ (zero width joiner; U+200C), the ZWNJ (zero width non-joiner; U+200D) and the ZWSP (zero width space; U+200B) are recommended for inclusion in Korean Hangul fonts.

Features

The features listed below have been defined to create the basic forms for the languages that are supported on Korean Hangul systems. Regardless of the model an application chooses for supporting layout of complex scripts, Uniscribe requires a fixed order for executing features within a run of text to consistently obtain the proper basic form. This is achieved by calling features one-by-one in the standard order listed below.

The order of the lookups within each feature is also very important. For more information on lookups and defining features in OpenType fonts, see the Encoding section of the OpenType Development document.

The standard order for applying Korean Hangul features encoded in OpenType fonts:

Feature	Feature function	Layout operation	Required
Language based forms:
ccmp	Character composition/decomposition substitution	GSUB
ljmo	Leading consonant Jamo	GSUB	X
vjmo	Vowel Jamo	GSUB	X
tjmo	Trailing consonant Jamo	GSUB	X
[GSUB = glyph substitution, GPOS = glyph positioning]

Descriptions and examples of above features

Character composition (and decomposition)

Feature Tag: "ccmp"

The 'ccmp' feature is used to compose a number of glyphs into one glyph (GSUB lookup type 4). This feature is implemented before any other features because there may be times when a font vender wants to control certain shaping of glyphs.

This feature permits the composition of Old Hangul Jamos corresponding to sequences described in Appendix B. To compose Old Hangul syllables, these Jamo glyphs are then substituted to the appropriate form using the 'ljmo', 'vjmo' and 'tjmo' features. The 'ccmp' feature should be implemented before any other feature, so that these actions are given topmost priority. It is applicable to each of: Leading, Vowel and Trailing Jamo sequences.

For example: the below sequence (U1107 + U1109 + U1110) of leading Jamos composed with the 'ccmp' feature.
Illustration that shows U 1 1 0 7, U 1 1 0 9, and U 1 1 1 0 as separate glyphs and then combined into one glyph.

Leading consonant Jamo

Feature Tag: "ljmo"

The 'ljmo' feature is used to substitute the correct shape of a leading consonant Jamo for a Hangul syllable. The shaping of leading consonant Jamos is context based and depends on whether the leading Jamo is followed by a vowel Jamo alone or a sequence of vowel and trailing Jamo.

For example: the leading Jamo (U1113) is replaced by the correct leading form when followed by a vowel Jamo alone.
Illustration that shows U 1 1 1 3 with an arrow pointing to the correct leading form which consists of the same character, but the shape is altered.

Vowel Jamo

Feature Tag: "vjmo"

The 'vjmo' feature is used to substitute the correct shape of a vowel Jamo for a Hangul syllable. The shaping of vowel Jamos is context based and depends on whether it is preceded by a leading Jamo alone, or a leading Jamo and followed by a trailing Jamo.

For example: the Hangul vowel Jungseong AE (U1162) is replaced by the correct form when preceded by a leading Jamo alone.
Illustration that shows U 1 1 6 2 with an arrow pointing to the correct form which is narrower, the horizontal lines are more spread out, and all of the lines are thicker.

Trailing consonant Jamo

Feature Tag: "tjmo"

The 'tjmo' feature is used to substitute the correct shape of a trailing consonant Jamo for a Hangul syllable. The shaping of trailing consonant Jamos is context based and depends on whether the trailing Jamo is preceded by a leading Jamo filler and vowel Jamo or by a leading Jamo and vowel Jamo.

For example: U11C7 is replaced by the correct trailing consonant when preceded by a leading Jamo and vowel Jamo.
Illustration that shows U 1 1 C 7 with an arrow pointing to the correct trailing consonant which is a smaller version of the original.

More Examples

1. Old Hangul Jamo containing leading consonants, vowels and trailing Jamos.

Input sequence: This sequence consists of: Choseong Pieup, Choseong Sios, Choseong Thieuth, Jungseong O, Jungseong Ya, Jungseong I, Jongseong Rieul, Jongseong Mieum, Jongseong Hieuh.
Illustration that shows a sequence of nine jamos in three groups. The first group is the leading consonant, or choseong, jamos. The second group is the vowel, or jungseong, jamos. The third group is the final consonant, or jongseong, jamos.

'ccmp' feature applied:
Illustration of the glyph sequence after the C C M P feature has been applied. The three leading jamo glyphs have been substituted by one combined jamo glyph. The three vowel jamo glyphs have been substituted by one combined jamo glyph. The three trailing jamo glyphs have been substituted by one combined jamo glyph.

'ljmo', 'vjmo' and 'tjmo' features applied:
Illustration that shows the complete syllable after the L J M O feature, the V J M O feature, and the T J M O feature have been applied. Each of the three glyphs in the sequence has been substituted with a variant glyph with the required size and position needed for the combination.

2. Leading consonant Jamo + vowel Jamo + trailing Jamo.

Input sequence: This sequence consists of: Choseong Ssangkiyeok, Jungseong A, Jongseong Nieun-Sios.
Illustration that shows the sequence Choseong Ssangkiyeok plus Jungseong A plus Jongseong Nieun-Sios.

'ljmo', 'vjmo' and 'tjmo' features applied:
Illustration that shows variants of the three glyphs after the L J M O feature, the V J M O feature, and the T J M O feature have been applied. An arrow points to an illustration of how the three glyphs combine to form the syllable.

3. Leading consonant Jamo + vowel Jamo

Input sequence: This sequence consists of: Choseong Nieun-Kiyeok, Jungseong Ae.
Illustration that shows the sequence of two glyphs, Choseong Nieun Kiyeok plus Jungseong Ae.

'ljmo' and 'vjmo' features applied:
Illustration that shows variants of the two glyphs after the L J M O feature, the V J M O feature, and the T J M O feature have been applied. An arrow points to an illustration of how the two glyphs combine to form the syllable.

Appendices

Appendix A: Writing System Tags
Appendix B: Standard Composition for Old Hangul Jamos

Appendix A: Writing System Tags

Features are encoded according to both a designated script and language system. The language system tag specifies a typographic convention associated with a language or linguistic subgroup.

Currently, the Uniscribe engine only supports the "default" language for each script. However, font developers may want to build language specific features which are supported in other applications and will be supported in future Microsoft OpenType implementations.

NOTE: It is strongly recommended to include the "dflt" language tag in all OpenType fonts because it defines the basic script handling for a font. The "dflt" language system is used as the default if no other language specific features are defined or if the application does not support that particular language. If the "dflt" tag is not present for the script being used, the font may not work in some applications.

The following tables list the registered tag names for scripts and language systems.

Registered tags for the Korean Hangul script		Registered tags for Korean Hangul language systems
Script tag	Script	Language system tag	Language
"hang"	Korean Hangul	"dflt"	*default script handling
		"KOR "	Korean

Note: both the script and language tags are case sensitive (script tags should be lowercase, language tags are all caps) and must contain four characters (ie. you must add a space to the three character language tags).

Appendix B: Standard Composition for Old Hangul Jamos

Leading Consonants
Code point	Glyph		Code point	Glyph		Code point	Glyph
U+115F
U+1100
U+1101
U+1102
U+1113
U+1114
U+1115
U+1116
U+1102		+	U+1109
U+1102		+	U+110C
U+1102		+	U+1112
U+1103
U+1117
U+1104
U+1103		+	U+1105
U+1103		+	U+1106
U+1103		+	U+1107
U+1103		+	U+1109
U+1103		+	U+110C
U+1105
U+1105		+	U+1100
U+1105		+	U+1100		+	U+1100
U+1118
U+1105		+	U+1103
U+1105		+	U+1103		+	U+1103
U+1119
U+1105		+	U+1106
U+1105		+	U+1107
U+1105		+	U+1107		+	U+1107
U+1105		+	U+112B
U+1105		+	U+1109
U+1105		+	U+110C
U+1105		+	U+110F
U+111A
U+111B
U+1106
U+1106		+	U+1100
U+1106		+	U+1103
U+111C
U+1106		+	U+1109
U+111D
U+1107
U+111E
U+111F
U+1120
U+1108
U+1121
U+1122
U+1123
U+1124
U+1125
U+1126
U+1107		+	U+1109		+	U+1110
U+1127
U+1128
U+1107		+	U+110F
U+1129
U+112A
U+1107		+	U+1112
U+112B
U+112C
U+1109
U+112D
U+112E
U+112F
U+1130
U+1131
U+1132
U+1133
U+110A
U+1109		+	U+1109		+	U+1107
U+1134
U+1135
U+1136
U+1137
U+1138
U+1139
U+113A
U+113B
U+113C
U+113D
U+113E
U+113F
U+1140
U+110B
U+1141
U+1142
U+110B		+	U+1105
U+1143
U+1144
U+1145
U+1146
U+1147
U+1148
U+1149
U+114A
U+114B
U+110B		+	U+1112
U+114C
U+110C
U+114D
U+110D
U+110C		+	U+110C		+	U+1112
U+114E
U+114F
U+1150
U+1151
U+110E
U+1152
U+1153
U+1154
U+1155
U+110F
U+1110
U+1110		+	U+1110
U+1111
U+1156
U+1111		+	U+1112
U+1157
U+1112
U+1112		+	U+1109
U+1158
U+1159
U+1159		+	U+1159

Vowels
Code point	Glyph		Code point	Glyph		Code point	Glyph
U+1160
U+1161
U+1176
U+1177
U+1161		+	U+1173
U+1162
U+1163
U+1178
U+1179
U+1163		+	U+116E
U+1164
U+1165
U+117A
U+117B
U+117C
U+1166
U+1167
U+1167		+	U+1163
U+117D
U+117E
U+1168
U+1169
U+116A
U+116B
U+1169		+	U+1163
U+1169		+	U+1163		+	U+1175
U+117F
U+1180
U+1169		+	U+1167
U+1181
U+1182
U+1169		+	U+1169		+	U+1175
U+1183
U+116C
U+116D
U+116D		+	U+1161
U+116D		+	U+1161		+	U+1175
U+1184
U+1185
U+116D		+	U+1165
U+1186
U+1187
U+1188
U+116E
U+1189
U+118A
U+116F
U+118B
U+1170
U+116E		+	U+1167
U+118C
U+118D
U+1171
U+116E		+	U+1175		+	U+1175
U+1172
U+118E
U+1172		+	U+1161		+	U+1175
U+118F
U+1190
U+1191
U+1192
U+1172		+	U+1169
U+1193
U+1194
U+1173
U+1173		+	U+1161
U+1173		+	U+1165
U+1173		+	U+1165		+	U+1175
U+1173		+	U+1169
U+1195
U+1196
U+1174
U+1197
U+1175
U+1198
U+1199
U+1175		+	U+1163		+	U+1169
U+1175		+	U+1163		+	U+1175
U+1175		+	U+1167
U+1175		+	U+1167		+	U+1175
U+119A
U+1175		+	U+1169		+	U+1175
U+1175		+	U+116D
U+119B
U+1175		+	U+1172
U+119C
U+1175		+	U+1175
U+119D
U+119E
U+119E		+	U+1161
U+119F
U+119E		+	U+1165		+	U+1175
U+11A0
U+11A1
U+11A2

Trailing Consonants
Code point	Glyph		Code point	Glyph		Code point	Glyph
U+11A8
U+11A9
U+11A8		+	U+11AB
U+11C3
U+11A8		+	U+11B8
U+11AA
U+11C4
U+11A8		+	U+11BE
U+11A8		+	U+11BF
U+11A8		+	U+11C2
U+11AB
U+11C5
U+11AB		+	U+11AB
U+11C6
U+11AB		+	U+11AF
U+11C7
U+11C8
U+11AC
U+11AB		+	U+11BE
U+11C9
U+11AD
U+11AE
U+11CA
U+11AE		+	U+11AE
U+11AE		+	U+11AE		+	U+11B8
U+11CB
U+11AE		+	U+11B8
U+11AE		+	U+11BA
U+11AE		+	U+11BA		+	U+11A8
U+11AE		+	U+11BD
U+11AE		+	U+11BE
U+11AE		+	U+11C0
U+11AF
U+11B0
U+11AF		+	U+11A8		+	U+11A8
U+11CC
U+11AF		+	U+11A8		+	U+11C2
U+11CD
U+11CE
U+11CF
U+11D0
U+11AF		+	U+11AF		+	U+11BF
U+11B1
U+11D1
U+11D2
U+11AF		+	U+11B7		+	U+11C2
U+11B2
U+11AF		+	U+11B8		+	U+11AE
U+11D3
U+11AF		+	U+11B8		+	U+11C1
U+11D4
U+11D5
U+11B3
U+11D6
U+11D7
U+11AF		+	U+11F0
U+11D8
U+11B4
U+11B5
U+11B6
U+11D9
U+11AF		+	U+11F9		+	U+11C2
U+11AF		+	U+11BC
U+11B7
U+11DA
U+11B7		+	U+11AB
U+11B7		+	U+11AB		+	U+11AB
U+11DB
U+11B7		+	U+11B7
U+11DC
U+11B7		+	U+11B8		+	U+11BA
U+11DD
U+11DE
U+11DF
U+11B7		+	U+11BD
U+11E0
U+11E1
U+11E2
U+11B8
U+11B8		+	U+11AE
U+11E3
U+11B8		+	U+11AF		+	U+11C1
U+11B8		+	U+11B7
U+11B8		+	U+11B8
U+11B9
U+11B8		+	U+11BA		+	U+11AE
U+11B8		+	U+11BD
U+11B8		+	U+11BE
U+11E4
U+11E5
U+11E6
U+11BA
U+11E7
U+11E8
U+11E9
U+11BA		+	U+11B7
U+11EA
U+11BA		+	U+11E6
U+11BB
U+11BA		+	U+11BA		+	U+11A8
U+11BA		+	U+11BA		+	U+11AE
U+11BA		+	U+11EB
U+11BA		+	U+11BD
U+11BA		+	U+11BE
U+11BA		+	U+11C0
U+11BA		+	U+11C2
U+11EB
U+11EB		+	U+11B8
U+11EB		+	U+11E6
U+11BC
U+11EC
U+11ED
U+11BC		+	U+11B7
U+11BC		+	U+11BA
U+11EE
U+11EF
U+11BC		+	U+11C2
U+11F0
U+11F0		+	U+11A8
U+11F1
U+11F2
U+11F0		+	U+11BF
U+11F0		+	U+11C2
U+11BD
U+11BD		+	U+11B8
U+11BD		+	U+11B8		+	U+11B8
U+11BD		+	U+11BD
U+11BE
U+11BF
U+11C0
U+11C1
U+11F3
U+11C1		+	U+11BA
U+11C1		+	U+11C0
U+11F4
U+11C2
U+11F5
U+11F6
U+11F7
U+11F8
U+11F9