Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
The term Composite and Precomposed in the Windows docs give me trouble, so I suspect they give other people trouble.
Basically Unicode provides 2 ways of encoding some characters. Sometimes a character can be encoded as a single character (like Ä or U+00C4), and other times as a combination of characters (A + ̈ or U+0041 + U+0308).
Windows uses the terms “Precomposed” and “Composite” to define these two ideas. Unfortunately Unicode defines both terms as "Decomposable Character", which is "A character that is equivalent to a sequence of one or more other characters..." ie: the Ä form of the character.
I'd even argue that Microsoft messed up its English when it chose the word composite however long ago (probably my boss's boss's boss, so shhhhh:-). The dictionary I looked at said composite is “A structure or an entity made up of distinct components.”, which sounds to me like Ä. Sadly we chose to use composite to describe a “Combining Character Sequence” such as A + ̈ . This is somewhat mitigated by the fact that we did this long ago when these technologies were still pretty new, but it doesn’t help the fact that I get confused every time I have to see these words. Since I’m the guy that maintains these APIs, I figure if I get confused by it, others must too J
So to summarize, when you see these words in the docs for windows APIs:
Precomposed characters are characters like Ä (U+00C4) that use one code point to represent a single character.
Composite characters (in windows documentation and constants) are sequences of code points like A + ̈ (U+0041 + U+0308) that use multiple code points to represent a single character shape.
For what its worth, Windows tends to generate characters in a Precomposed form when possible, however even then, cut & paste and other items can cause combining character sequences to occur.