ScriptItemize function (usp10.h)

Breaks a Unicode string into individually shapeable items.

Syntax

HRESULT ScriptItemize(
  [in]           const WCHAR          *pwcInChars,
  [in]           int                  cInChars,
  [in]           int                  cMaxItems,
  [in, optional] const SCRIPT_CONTROL *psControl,
  [in, optional] const SCRIPT_STATE   *psState,
  [out]          SCRIPT_ITEM          *pItems,
  [out]          int                  *pcItems
);

Parameters

[in] pwcInChars

Pointer to a Unicode string to itemize.

[in] cInChars

Number of characters in pwcInChars to itemize.

[in] cMaxItems

Maximum number of SCRIPT_ITEM structures defining items to process.

[in, optional] psControl

Pointer to a SCRIPT_CONTROL structure indicating the type of itemization to perform.

Alternatively, the application can set this parameter to NULL if no SCRIPT_CONTROL properties are needed. For more information, see the Remarks section.

[in, optional] psState

Pointer to a SCRIPT_STATE structure indicating the initial bidirectional algorithm state.

Alternatively, the application can set this parameter to NULL if the script state is not needed. For more information, see the Remarks section.

[out] pItems

Pointer to a buffer in which the function retrieves SCRIPT_ITEM structures representing the items that have been processed. The buffer should be (cMaxItems + 1) * sizeof(SCRIPT_ITEM) bytes in length. It is invalid to call this function with a buffer to hold less than two SCRIPT_ITEM structures. The function always adds a terminal item to the item analysis array so that the length of the item with zero-based index "i" is always available as:

pItems[i+1].iCharPos - pItems[i].iCharPos;

[out] pcItems

Pointer to the number of SCRIPT_ITEM structures processed.

Return value

Returns 0 if successful. The function returns a nonzero HRESULT value if it does not succeed.

The function returns E_INVALIDARG if pwcInChars is set to NULL, cInChars is 0, pItems is set to NULL, or cMaxItems < 2.

The function returns E_OUTOFMEMORY if the value of cMaxItems is insufficient. As in all error cases, no items are fully processed and no part of the output array contains defined values. If the function returns E_OUTOFMEMORY, the application can call it again with a larger pItems buffer.

Remarks

See Displaying Text with Uniscribe for a discussion of the context in which this function is normally called.

The function delimits items by either a change of shaping engine or a change of direction.

The application can create multiple ranges, or runs that fall entirely within a single item, from each SCRIPT_ITEM structure retrieved by ScriptItemize. However, it should not combine multiple items into a single run. Later, when measuring or rendering, the application can call ScriptShape for each run and must pass the SCRIPT_ANALYSIS structure retrieved by ScriptItemize in the SCRIPT_ITEM structure.

If the text handled by an application can include any right-to-left content, the application uses the psControl and psState parameters in calling ScriptItemize. However, the application does not have to do this and can handle bidirectional text itself instead of relying on Uniscribe to do so. The psControl and psState parameters are useful in some strictly left-to-right scenarios, for example, when the fLinkStringBefore member of SCRIPT_CONTROL is not specific to right-to-left scripts. The application sets psControl and psState to NULL to have ScriptItemize break the Unicode string purely by character code.

The application can set all parameters to non-NULL values to have the function perform a full Unicode bidirectional analysis. To permit a correct Unicode bidirectional analysis, the SCRIPT_STATE structure should be initialized according to the reading order at paragraph start, and ScriptItemize should be passed the whole paragraph. In particular, the uBidiLevel member should be initialized to 0 for left-to-right and 1 for right-to-left.

The fRTL member of SCRIPT_ANALYSIS is referenced in SCRIPT_ITEM enabled="1". The fNumeric member of SCRIPT_PROPERTIES is retrieved by ScriptGetProperties. These members together provide the same classification as the lpClass member of GCP_RESULTS, referenced by lpResults in GetCharacterPlacement.

European digits U+0030 through U+0039 can be rendered as national digits, as shown in the following table.

SCRIPT_STATE.fDigitSubstitute SCRIPT_CONTROL.fContextDigits Digit shapes displayed for Unicode U+0030 through U+0039
FALSE Any European digits
TRUE FALSE As specified in uDefaultLanguage member of SCRIPT_CONTROL.
TRUE TRUE As prior strong text, defaulting to uDefaultLanguage member of SCRIPT_CONTROL.
 

In context digit mode, one of the following actions occurs:

  • If the script specified by uDefaultLanguage is in the same direction as the output, all digits encountered before the first letters are rendered in the language indicated by uDefaultLanguage.
  • If the script specified by uDefaultLanguage is in the opposite direction from the output, all digits encountered before the first letters are rendered in European digits.
For example, if uDefaultLanguage indicates LANG_ARABIC, initial digits are in Arabic-Indic in a right-to-left embedding. However, they are in European digits in a left-to-right embedding.

For more information, see Digit Shapes.

The Unicode control characters and definitions, and their effects on SCRIPT_STATE members, are provided in the following table. For more information on Unicode control characters, see the The Unicode Standard.

Unicode control characters Meaning Effect on SCRIPT_STATE
NADS Override European digits (NODS) with national digit shapes. Set fDigitSubstitute.
NODS Use nominal digit shapes, otherwise known as European digits. See NADS. Clear fDigitSubstitute.
ASS Activate swapping of symmetric pairs, for example, parentheses. For these characters, left and right are interpreted as opening and closing. This is the default. See ISS. Clear fInhibitSymSwap.
ISS Inhibit swapping of symmetric pairs. See ASS. Set fInhibitSymSwap.
AAFS Activate Arabic form shaping for Arabic presentation forms. See IAFS. Set fCharShape.
IAFS Inhibit Arabic form shaping, that is, ligatures and cursive connections, for Arabic presentation forms. Nominal Arabic characters are not affected. This is the default. See AAFS. Clear fCharShape.
 

The fArabicNumContext member of SCRIPT_STATE supports the context-sensitive display of numerals in Arabic script text. It indicates if digits are rendered using native Arabic script digit shapes or European digits. At the beginning of a paragraph, this member should normally be initialized to TRUE for an Arabic locale, or FALSE for any other locale. The function updates the script state it as it processes strong text.

Important  Starting with Windows 8: To maintain the ability to run on Windows 7, a module that uses Uniscribe must specify Usp10.lib before gdi32.lib in its library list.
 

Requirements

Requirement Value
Minimum supported client Windows 2000 Professional [desktop apps only]
Minimum supported server Windows 2000 Server [desktop apps only]
Target Platform Windows
Header usp10.h
Library Usp10.lib
DLL Usp10.dll
Redistributable Internet Explorer 5 or later on Windows Me/98/95

See also

Displaying Text with Uniscribe

SCRIPT_ANALYSIS

SCRIPT_CONTROL

SCRIPT_ITEM

SCRIPT_PROPERTIES

SCRIPT_STATE

ScriptItemizeOpenType

ScriptShape

Uniscribe

Uniscribe Functions