ScriptItemize
Other versions of this page are also available for the following:
8/28/2008
This function breaks a Unicode string into individually shapeable items.
Syntax
HRESULT WINAPI ScriptItemize(
const WCHAR*
pwcInChars
,
int
cInChars
,
int
cMaxItems
,
const SCRIPT_CONTROL*
psControl
,
const SCRIPT_STATE*
psState
,
SCRIPT_ITEM*
pItems
,
int*
pcItems
);
Parameters
- pwcInChars
[in] Pointer to a Unicode string to be itemized.
- cInChars
[in] Number of characters in pwcInChars to be itemized.
- cMaxItems
[in] Maximum number of SCRIPT_ITEM structures to process.
- psControl
[in] Pointer to a SCRIPT_CONTROL structure containing flags indicating the type of itemization to be performed. Use NULL if this is not needed.
- psState
[in] Pointer to a SCRIPT_STATE structure indicating the initial bidirectional algorithm state. Use NULL if this is not needed.
- pItems
[out] Pointer to a buffer to receive the SCRIPT_ITEM structures processed. The buffer pointed to by pItems should be cMaxItems * sizeof(SCRIPT_ITEM) bytes in length.
- pcItems
[out] Pointer to a variable to receive the number of SCRIPT_ITEM structures processed.
Return Value
If the function succeeds, the return value is zero.
If the function fails, it returns a nonzero value. The function returns E_INVALIDARG if pwcInChars is NULL or cInChars is 0 or pItems is NULL or cMaxItems < 2.
The function returns E_OUTOFMEMORY if the output buffer length (cMaxItems) is insufficient. Note that in this case, as in all error cases, no items have been fully processedso no part of the output array contains defined values.
If any other unrecoverable error is encountered, it is returned as an HRESULT.
Remarks
Items are delimited by either a change of shaping engine or a change of direction.
The client may create multiple runs from each SCRIPT_ITEM returned by ScriptItemize, but should not combine multiple items into a single run. The reason for this is that later the client will call ScriptShape for each run (when measuring or rendering), and must pass the SCRIPT_ANALYSIS structure that ScriptItemize returned. Each SCRIPT_ITEM contains a SCRIPT_ANALYSIS structure.
If psControl and psState are NULL on entry, ScriptItemize breaks the Unicode string purely by character code. If the Parameters are all non-NULL, ScriptItemize performs a full Unicode bidirectional analysis.
The ScriptItemize function always adds a terminal item to the item analysis array (pItems) such that the length of an item at pItem is always available as (in the case of one item):
pItem[1].iCharPos - pItem[0].iCharPos
For this reason, it is invalid to call ScriptItemize with a buffer of less than two SCRIPT_ITEM structures.
To perform a correct Unicode bidirectional analysis, the SCRIPT_STATE structure should be initialized according to the reading order at paragraph start, and ScriptItemize should be passed the whole paragraph.
The bidirectional stack is not large, just 16 bytes. It should be shared between calls.
If shaping is disabled (fDisableGlyphShape in SCRIPT_STATE), complex scripts are substituted by SCRIPT_UNDEFINED, causing shaping to be performed with contextual substitution following the one-to-one code point to glyph mapping provided by the fonts cmap table. The rendering direction is still set appropriately.
European digits U+0030 through U+0039 may be rendered as national digits as shown in the following table.
fDigitSubstitute | FContextDigits | Digit shapes displayed for Unicode U+0030 through U+0039 |
---|---|---|
False |
Any |
Western (European / North American) digits |
True |
False |
As specified in SCRIPT_CONTROL.uDefaultLanguage. |
True |
True |
As prior strong text, defaulting to SCRIPT_CONTROL.uDefaultLanguage. |
Note that in context digit mode, any digits encountered before the first letters are rendered in SCRIPT_CONTROL.uDefaultLanguage if that script is in the same direction as the output, and in Arabic-Indic, that is, Western, digits if the direction is opposite. For example if SCRIPT_CONTROL.uDefaultLanguage is LANG_ARABIC, initial digits will be in Arabic-Indic in a RTL embedding, but in Western, which is also known as Arabic, in a LTR embedding.
Effect of Unicode control characters on SCRIPT_STATE.
SCRIPT_STATE flag | Set by | Cleared by |
---|---|---|
fDigitSubstitute |
NADS |
NODS |
fInhibitSymSwap |
ISS |
ASS |
fCharShape |
AAFS |
IAFS |
The Unicode control characters are defined in the following table. For more information, see the Unicode Standard.
Unicode control characters | Description |
---|---|
NADS |
Overrides Western digits (NODS) with national digit shapes. |
NODS |
Nominal digit shapes, otherwise known as Western digits. See NADS. |
ASS |
Activates swapping of symmetric pairs (for example, parentheses). For these characters, left and right are interpreted as opening and closing. This is the default. See ISS. |
ISS |
Inhibits swapping of symmetric pairs. See ASS. |
AAFS |
Activates Arabic form shaping, that is, ligatures or cursive connections, for Arabic presentation forms. See IAFS. |
IAFS |
Inhibits Arabic form shaping, that is ligatures and cursive connections, for Arabic presentation forms. Nominal Arabic characters are not affected. This is the default. See AAFS. |
SCRIPT_STATE.fArabicNumContext controls the Unicode EN-AN rule. At the beginning of a paragraph it should normally be initialized to TRUE for an Arabic locale, FALSE for any other. The ScriptItemize function will update it as it processes strong text.
Requirements
Header | usp10.h |
Library | Uspce.lib |
Windows Embedded CE | Windows CE 5.0 and later |
See Also
Reference
ScriptGetProperties
SCRIPT_ANALYSIS
SCRIPT_ITEM
SCRIPT_CONTROL
SCRIPT_STATE