Convert a Devanagari font to Unicode / OTL

Whichever way you look at it, having only Mangal is hardly sufficient for all text processing purposes and for all the languages that use the script. We would like to see many more Unicode Devanagari fonts appear, and hope that appropriate standards will emerge along the way.

So, suppose you have a font that was used to print Devanagari in some proprietary encoding. What does it take to convert it to Unicode? Where do I start?

There are several steps involved. This posting will briefly outline them. For more details one would refer to the Unicode Standard, the OpenType specification and the VOLT release notes. Different approaches are of course possible so please take my writings with a grain of salt. They only reflect one person's limited experience.

Glyph set

  1. Examine your font and make sure it covers the Unicode range for Devanagari. There should be at least one glyph for each code point in the Devanagari range. If, for example, your encoding assumed that DEVANAGARI LETTER O is coded as DEVANAGARI LETTER A plus DEVANAGARI VOWEL SIGN O, you would probably need to create a new glyph for DEVANAGARI LETTER O.

    As a result of this step, you may be creating several new glyphs that are, in fact, composites of already existing shapes (so you don't really need to design new shapes for them).

    You may end up creating composite glyphs from already existing shapes for the steps below as well. In fact, creating composites in your font as an alternative to complex OTL processing may significantly decrease complexity of your VOLT tables. Which is a good thing. So whenever you face an alternative of introducing a new complex (e.g. context-based) lookup or a dozen new composite glyphs in the font, I suggest you always choose the latter. (Of course if you need to create several hundred new glyphs, it's different)

  2. Make sure your font has all forms prescribed by the Creating and supporting OpenType fonts for Devanagari specification. E.g. (this list is most probably incomplete):

    • you should have glyphs for half forms for all consonants (for those that do not have a distinctive half form, e.g. TTA or TTHA, use a composite with halant)
    • nukta forms should be made as single glyphs as well
    • make sure you have glyphs for vattu ligatures (ligatures with below-base Ra) for both full and half forms, with or without nuktas.
    • add glyphs for full and half forms of akhand forms: Kssa and Dnya.

    This step will, again, most probably require you to produce quite a few of new composite glyphs. The gain is simplicity of your VOLT tables and the help you get from Uniscribe that controls application of those features.

  3. It is also a good time to revisit your choice of conjuncts and alternate letterforms.

Now, you can load your font into VOLT and proceed with conversion.

Assign the right Unicode values to glyphs (in VOLT)

Make sure that each Unicode code point from the Devanagari range is assigned the correct glyph from your font. If your font had a CMAP table for your proprietary encoding, the 'Unicode' fields for the glyphs will contain wrong values. You will need to erase those and type in Unicode values instead. Glyphs that have no Unicode points assigned to them should have the Unicode field left blank ("---"). They will be reached through application of OT features.

Name your glyphs (in VOLT)

If you are doing multiple fonts, it is easy to share names through the export/import glyph definitions feature. By sticking to one name convention you enable yourself to share glyph, group and lookup definitions between fonts.

Make substitution lookups for standard linguistic features

Akhands (2 substitutions for Devanagari), nukta forms, half forms, reph (1 substitution), below-base forms (ra-vattu only, so this lookup will have only 1 substitution) and vattu variants (ligatures with ra-vattu).

Note

These lookups are standard: what goes in there is pre-determined. They can 100% be shared between your fonts as long as you use the same naming convention. E.g. the lookup for half-forms ('half' feature) has substitutions of type:

<name for the full-form> halant -> <name for the half-form>

and there is really no deviation from it. Uniscribe controls the application of these features so you do not need to worry when they need to be applied: the feature will only be applied to the consonants that need to be in half form, and so on. (Again, please see the specification for details on each feature type).

Now you can create appropriate features under Script: Devanagari, Language System: Default and link them to the lookups you have created.

Create a ligature lookup for your conjuncts and input all of them'

A typical ligature for a full-form conjunct would look like

"name for consonant 1 in half form"
"name for consonant 2" ->
"name for conjunct consonant1-consonant2"

When arranging these ligatures, follow 2 rules:

  • input longer sequences first, and
  • input substitutions for full form conjuncts before those for half-form conjuncts

Observance of these two rules will ensure linguistically correct usage of conjuncts.

Now, create a feature for pre-base substitutions ('pres') and link your new lookup to it.

At this point, if you compile and save your font, it is already capable of producing legible text!! Do save it and try it out!

Typographical fine-tuning

Your font is already producing linguistically correct forms, the only job left is to make sure it is pretty: forms connect properly, do not clash etc. This can basically be done by using two techniques:

  • substituting alternate letterforms in presence of other glyphs. E.g. substituting the right form of short-i matra, or the right form of other matras on certain consonants, or ligating matras with vowel signs. Use features "pre-base substitutions", "above-base substitutions", "below-base substitutions" and "post-base substitutions" for this kind of processing.
  • positioning. The basic example is anchoring matras on full consonants, or adjusting their positions in presence of vowel signs. Use features "above-base marks" and "below-base marks" from these lookups.

What is nice about this step is that it is incremental: you can improve the quality of your font by adjusting positions or introducing new lookups for as long as you like. Please have a look at Mangal to see what lookups can go in here.

A good rule of thumb for Typographical fine-tuning is: go through your glyph set and look for any glyphs (alternate forms) you have not "used" yet. Think of what situations they should appear in, and code these situations via contextual lookups. Then fine tune with positioning.

If you had a font that could be used without OTL support to start with, very little will need to be done at this step (because the font has already been designed to work as is without adjustment). If the encoding was relying on the user choosing the right forms manually (e.g. choosing the right form of matra E on top of consonants like KA), now these rules need to be coded as lookups.