ScriptItemize

Article
03/19/2010

Other versions of this page are also available for the following:

Windows Embedded CE 6.0 R3

Windows Mobile Not Supported Windows Embedded CE Supported

8/28/2008

This function breaks a Unicode string into individually shapeable items.

Syntax

  
HRESULT WINAPI ScriptItemize(
  const WCHAR* 
  pwcInChars
  , 
  int 
  cInChars
  , 
  int 
  cMaxItems
  , 
  const SCRIPT_CONTROL* 
  psControl
  , 
  const SCRIPT_STATE* 
  psState
  , 
  SCRIPT_ITEM* 
  pItems
  , 
  int* 
  pcItems
   
);

Parameters

pwcInChars
[in] Pointer to a Unicode string to be itemized.

cInChars
[in] Number of characters in pwcInChars to be itemized.

cMaxItems
[in] Maximum number of SCRIPT_ITEM structures to process.

psControl
[in] Pointer to a SCRIPT_CONTROL structure containing flags indicating the type of itemization to be performed. Use NULL if this is not needed.

psState
[in] Pointer to a SCRIPT_STATE structure indicating the initial bidirectional algorithm state. Use NULL if this is not needed.

pItems
[out] Pointer to a buffer to receive the SCRIPT_ITEM structures processed. The buffer pointed to by pItems should be cMaxItems * sizeof(SCRIPT_ITEM) bytes in length.

pcItems
[out] Pointer to a variable to receive the number of SCRIPT_ITEM structures processed.

Return Value

If the function succeeds, the return value is zero.

If the function fails, it returns a nonzero value. The function returns E_INVALIDARG if pwcInChars is NULL or cInChars is 0 or pItems is NULL or cMaxItems < 2.

The function returns E_OUTOFMEMORY if the output buffer length (cMaxItems) is insufficient. Note that in this case, as in all error cases, no items have been fully processedso no part of the output array contains defined values.

If any other unrecoverable error is encountered, it is returned as an HRESULT.

Remarks

Items are delimited by either a change of shaping engine or a change of direction.

The client may create multiple runs from each SCRIPT_ITEM returned by ScriptItemize, but should not combine multiple items into a single run. The reason for this is that later the client will call ScriptShape for each run (when measuring or rendering), and must pass the SCRIPT_ANALYSIS structure that ScriptItemize returned. Each SCRIPT_ITEM contains a SCRIPT_ANALYSIS structure.

If psControl and psState are NULL on entry, ScriptItemize breaks the Unicode string purely by character code. If the Parameters are all non-NULL, ScriptItemize performs a full Unicode bidirectional analysis.

The ScriptItemize function always adds a terminal item to the item analysis array (pItems) such that the length of an item at pItem is always available as (in the case of one item):

pItem[1].iCharPos - pItem[0].iCharPos

For this reason, it is invalid to call ScriptItemize with a buffer of less than two SCRIPT_ITEM structures.

To perform a correct Unicode bidirectional analysis, the SCRIPT_STATE structure should be initialized according to the reading order at paragraph start, and ScriptItemize should be passed the whole paragraph.

The bidirectional stack is not large, just 16 bytes. It should be shared between calls.

If shaping is disabled (fDisableGlyphShape in SCRIPT_STATE), complex scripts are substituted by SCRIPT_UNDEFINED, causing shaping to be performed with contextual substitution following the one-to-one code point to glyph mapping provided by the fonts cmap table. The rendering direction is still set appropriately.

European digits U+0030 through U+0039 may be rendered as national digits as shown in the following table.

fDigitSubstitute	FContextDigits	Digit shapes displayed for Unicode U+0030 through U+0039
False	Any	Western (European / North American) digits
True	False	As specified in SCRIPT_CONTROL.uDefaultLanguage.
True	True	As prior strong text, defaulting to SCRIPT_CONTROL.uDefaultLanguage.

Note that in context digit mode, any digits encountered before the first letters are rendered in SCRIPT_CONTROL.uDefaultLanguage if that script is in the same direction as the output, and in Arabic-Indic, that is, Western, digits if the direction is opposite. For example if SCRIPT_CONTROL.uDefaultLanguage is LANG_ARABIC, initial digits will be in Arabic-Indic in a RTL embedding, but in Western, which is also known as Arabic, in a LTR embedding.

Effect of Unicode control characters on SCRIPT_STATE.

SCRIPT_STATE flag	Set by	Cleared by
fDigitSubstitute	NADS	NODS
fInhibitSymSwap	ISS	ASS
fCharShape	AAFS	IAFS

The Unicode control characters are defined in the following table. For more information, see the Unicode Standard.

Unicode control characters	Description
NADS	Overrides Western digits (NODS) with national digit shapes.
NODS	Nominal digit shapes, otherwise known as Western digits. See NADS.
ASS	Activates swapping of symmetric pairs (for example, parentheses). For these characters, left and right are interpreted as opening and closing. This is the default. See ISS.
ISS	Inhibits swapping of symmetric pairs. See ASS.
AAFS	Activates Arabic form shaping, that is, ligatures or cursive connections, for Arabic presentation forms. See IAFS.
IAFS	Inhibits Arabic form shaping, that is ligatures and cursive connections, for Arabic presentation forms. Nominal Arabic characters are not affected. This is the default. See AAFS.

SCRIPT_STATE.fArabicNumContext controls the Unicode EN-AN rule. At the beginning of a paragraph it should normally be initialized to TRUE for an Arabic locale, FALSE for any other. The ScriptItemize function will update it as it processes strong text.

Requirements

Header	usp10.h
Library	Uspce.lib
Windows Embedded CE	Windows CE 5.0 and later

Share via

ScriptItemize

Syntax

Parameters

Return Value

Remarks

Requirements

See Also

Reference

Additional resources