Regular Expression Support in Microsoft Office System Smart Tags

 

Markus Egger
EPS Software Corporation

August 2003

Applies to:
     Microsoft® Office Excel 2003
     Microsoft Office Word 2003

Summary: Regular expression with smart tags makes smart tags more accurate in identifying text that should be linked to smart tag actions. Learn how to use regular expressions to analyze strings within Microsoft Office smart tag list files to identify meaningful sections and words in Office documents. Learn about regular expression syntax and how to identify different types of text for use with simple actions or more sophisticated ones. (17 printed pages)

Contents

Introduction
Microsoft Office Smart Tag Lists
Smart Tag Lists and Regular Expressions
A Brief Introduction to Regular Expression Syntax
Some Sample Regular Expression Smart Tags
More Sophisticated Actions
Conclusion

Introduction

Developers tend to think of data as information stored in tables that are, in turn, contained in a relational database system. This is only the case for a subset of data. A large amount of information is stored in e-mails, spreadsheets, and documents. Microsoft Office XP was first to unlock this type of data using a technology known as smart tags.

Smart tags identify words and phrases within a document. Type MSFT in a Microsoft Word 2003 document for instance, and Word 2003 indicates that this "word" was recognized as a financial symbol (assuming the Financial Symbol smart tag is activated). Hover the mouse over the recognized word, and a little icon (the smart tag) appears that you can expand to a whole menu with a single mouse-click. The menu provides a list of actions that are useful for the financial symbol, as shown in Figure 1.

Figure 1. The smart tag drop-down menu for the financial symbol MSFT

Although stock symbols make great generic examples, there are even more useful and individualized smart tags that are possible. A company could choose to generate smart tags that recognize product names or SKUs for instance, and then link those smart tags to functionality in their own enterprise system.

From a developer's point of view, there are two ways to implement custom smart tags. One is to implement a number of interfaces in a language such as Microsoft Visual Basic 6.0, Microsoft Visual Basic .NET, or Microsoft Visual C#, compile the result, and register it with Office. The other way is to generate an XML-based smart tag list, which is much easier in many scenarios.

Microsoft Office Smart Tag Lists

Smart tag lists are smart tag recognizers defined in Extensible Markup Language (XML). Smart tag lists are not quite as flexible as smart tags implemented in a programming language, but they are great whenever you have a somewhat restricted list of words or phrases to recognize. For example, you could create a smart tag list that recognizes the product names Office, TabletPC, and Windows, and subsequently links to the Microsoft Web site of each individual product. This could be done with the following XML definition:

<?xml version="1.0" encoding="UTF-16"?>
<FL:smarttaglist xmlns:FL=“urn:schemas-microsoft-com:smarttags:list”>
   <FL:name>Microsoft Products</FL:name>
   <FL:lcid>1033</FL:lcid>
   <FL:description>Recognizes 3 Microsoft Products</FL:description>
   <FL:moreinfourl>http://www.Microsoft.com</FL:moreinfourl>

   <FL:smarttag type="urn:schemas-manualtags#msproducts">
      <FL:caption>Microsoft Products (MOSTL)</FL:caption>
      <FL:terms>
         <FL:termlist>Windows,TabletPC,Office</FL:termlist>
      </FL:terms>
      <FL:actions>
         <FL:action id="ms">
            <FL:caption>Microsoft Website</FL:caption>
            <FL:url>http://www.microsoft.com/{TEXT}</FL:url>
         </FL:action>
      </FL:actions>
   </FL:smarttag>
</FL:smarttaglist>

To use, you must store this code as an XML file in the following directory:

local_drive:\Program Files\Common Files\Microsoft Shared\Smart Tag\Lists\

In order for this to take effect, you must close all tag-compliant Office 2003 application starts (such as Word 2003). The next time any smart tag-compliant application starts, Office adds this definition the list of smart tag recognizers, and whenever a user types one of the three supported product names, the application highlights the product name, as seen in Figure 2.

Figure 2. The smart tag takes you to the Microsoft Web site.

By default, the new smart tag list is enabled. If the smart tag is not (or is disabled for some reason), you can turn it on using the AutoCorrect options**from the Tools menu, as shown in Figure 3:

Figure 3. Use the AutoCorrect dialog box to specify smart tag behavior

Smart tag lists provide a very convenient way to add a well-defined list of terms to recognize automatically. The above example represents only a small set of features available in smart tag lists. In Office 2003, this concept is broadened by allowing more dynamic recognition of terms through the support of regular expression pattern matching.

**Note   **For a complete list of features see the Microsoft Office 2003 Smart Tag Software Development Kit.

Smart Tag Lists and Regular Expressions

The Smart Tag List tool allows you to define search patterns in addition to search lists. These search patterns are defined through regular expressions. A regular expression is a powerful form of a wildcard search. Most developers and some power-users are familiar with using regular expressions to search for files, such as *.doc. When it comes to regular expression pattern matching in Office documents, you generally apply more sophisticated patterns. The following pattern, for instance, finds IP addresses within a text string:

(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])\.(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])\.
(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])\.(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])

**Note   **The pattern above shows a regular expression, not a conventional line of code. There should be no line breaks or line continuation characters in regular expressions.

This is a somewhat sophisticated regular expression, and might initially be difficult for people who are not familiar with such expressions to decipher. For now, the details will be left for later. The following example shows how to embed the regular expression into a smart tag list XML file:

<?xml version="1.0" encoding="UTF-16"?>
<FL:smarttaglist xmlns:FL=“urn:schemas-microsoft-com:smarttags:list”>
   <FL:name>IP Addresses</FL:name>
   <FL:description>Recognizes IP Addresses</FL:description>
   <FL:lcid>1033</FL:lcid>
   <FL:smarttag type="urn:schemas-manualtags#ipaddresses">
      <FL:caption>IP Address (MOSTL)</FL:caption>
      <FL:re>
         <FL:exp>(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])\.
         (\d{1,2}|1\d\d|2[0-4]\d|25[0-5])\.(\d{1,2}|1\d\d|2[0-4]\
         d|25[0-5])\.(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])</FL:exp>
      </FL:re>
      <FL:actions>
         <FL:action id="ip">
            <FL:caption>Browse to Address</FL:caption>
            <FL:url>http://{TEXT}/</FL:url>
         </FL:action>
         <FL:action id="ips">
            <FL:caption>Browse to secure Address</FL:caption>
            <FL:url>https://{TEXT}/</FL:url>
         </FL:action>
      </FL:actions>
   </FL:smarttag>
</FL:smarttaglist>

Let's take a close look at what the most important individual elements of this XML definition file mean. Fundamentally, you create an XML definition file that uses the FL namespace as defined by Microsoft Corporation. Subsequently, you provide a name and description for the recognizer, as well as a locale ID, for example, 1033 for US-English. Then, you define the actual smart tag. To do so, you must provide a caption for the smart tag to display in the smart tag menu, recognition information such as a term list or, as in the example, regular expressions, and associated actions.

In this scenario, you only provide one regular expression for the purpose of identifying IP addresses. The pattern is defined in the <FL:exp> tag, which is a member of the <FL:re> tag. You could define multiple expressions that can each identify an IP address if you desired, which would be useful to identify a series of IP addresses as well as domain names as shown later in this article. Finally, you define the possible actions in the <FL:actions> tag. In this example, you define two different actions that end up as separate menu items on the smart tag. Both actions open a browser window and navigate to the IP address, one using the HTTP protocol, and the other using the secure HTTPS counterpart. Each individual action is defined in an <FL:action> tag. Each action requires a unique ID, as well as a caption (to be displayed in the menu), and a URL to which the action links.

Save this definition into the smart tag list directory (see above), and launch an Office application, such as Word 2003, and type an IP address. The result is shown in Figure 4.

Figure 4. You can use a smart tag to identify a specific IP address.

This works in other Office 2003 applications as well without additional changes. Figure 5 shows the same smart tag used in Microsoft Excel 2003.

Figure 5: The same smart tag also works in Excel 2003

A Brief Introduction to Regular Expression Syntax

Up to this point we have seen a regular expression-driven smart tag at work, but unless you are already familiar with regular expression syntax, how it works may not yet be clear. To explain further, regular expressions match strings based on a provided pattern. In its simplest form, that pattern could be an exact example of what we are looking for, such as:

Microsoft

This regular expression matches every occurrence of the string Microsoft. You could use this as the expression of your smart tag definition file, and you would, in fact, see a smart tag highlighting every occurrence of "Microsoft" in the document. This is less than exciting, and there are easier ways to do this than using regular expressions. But what if you wanted a little more flexibility and also recognize "microsoft", "MicroSoft", and perhaps even "microSoft"? You could do so by changing the pattern to something a bit more flexible:

[M|m]icro[S|s]oft

Square brackets indicate that any of the characters within them are valid matches. The | character indicates an either/or expression. So the first character has to be either M or m, followed by icro, followed by either an S or an s, followed by oft. What is interesting here is that this pattern is rather easy to write, but it isn't quite as easy to read, making it difficult to explain regular expressions.

You could have also indicated ranges using very similar syntax:

[A-Z]icrosoft

This pattern applies to Microsoft as well as to Aicrosoft, Bicrosoft, Cicrosoft, and so forth. On the other hand, this does not match microsoft, as the range only includes upper case characters. To also include lower case characters, you have to specify the following expression:

[A-Za-z]icrosoft

You have now already seen two special characters: the combination of the open and close square brackets ([ and ]), as well as the pipe character (|). There are many other special characters that can be used in regular expressions, such as a period (.) that matches any character except the new-line character. You can use curly brackets ({ and }) to match a specific number of characters. Consider the following pattern as an example:

Microsoft{3}

Curly brackets indicate that you are looking for three ts at the end of the string (the brackets apply to the character or group preceding the curly bracket expression). Therefore, Microsoft(3) is the equivalent of the following pattern:

Microsofttt

Curly brackets are most useful when you attempt to match a flexible number of characters, as in this example:

Microsoft{1,3}

This recognizes Microsoft, Microsoftt, and Microsofttt.

You can also group sections of the patterns using parenthesis.

(Microsoft)

This allows you to apply subsequent regular expressions to the entire group:

(Microsoft){1,3}

This matches not only Microsoft, but also MicrosoftMicrosoft and MicrosoftMicrosoftMicrosoft.

Beyond these special characters, there are escape characters with special meaning, such as \s, \w, and \d, among others. The character \s matches any blank space character, such as a space or a tab. The character \w matches any alphanumeric character, including underscores. The character \d matches any single-digit number.

Based on what you have explored so far, you can now go back and start to analyze the first section of the IP address pattern used above:

(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])

This defines the first group of numbers that can occur in an IP address. The first pattern (\d{1,2}) matches any number of one or two digits, such as 1, 10, or 99.

The pattern continues with the bar (|), indicating that alternately, you could also encounter a different string, such as:

1\d\d

This means that there is a 1 followed by two digits. This matches numbers such as 100, 150, or 199. Between the two expressions you have examined so far, you can cover a number range from 0 to 199, but of course IP addresses span a larger range. Therefore, you have two more alternate patterns for the first group of numbers. Here's the next string that you may encounter:

2[0-4]\d

In this case, you are looking for a 2, followed by a 0, 1, 2, 3, or 4, followed by any single digit. This pattern matches any number from 200 to 249. But once again, there is more.

25[0-5]

This time, you are looking for 25 followed by a number between 0 and 5, resulting in a range from 250 to 255. You have now defined all alternatives to match numbers ranging from 0 to 255 within a string as a single group indicated by the parenthesis around the pattern. Remember that this is not a true numeric value.

Of course, the complete pattern is quite a bit longer. The part you examined so far is only the first section of an IP address. IP addresses have four sets of numbers separated by periods. This separation is indicated by the following expression:

\.

In this segment, you are really only looking for a period. But as you remember, the period is a special character in regular expressions, matching any character except the new line character. This is not what you need; you need a way to tell the regular expression engine that you do not want to treat this period as a special character at all, but instead only match the period character. You can do this using the backslash character preceding the period.

The rest of the pattern is just more of what you already examined for the three remaining number sets.

Note that this brief introduction to regular expression syntax hardly scratches the surface of what's possible. To delve further into this powerful pattern-matching language, you can find numerous articles, white papers, and books written on the subject.

Some Sample Regular Expression Smart Tags

To give you some ideas of what can be accomplished with regular expression technology, this document includes a few sample smart tags. You have already seen a smart tag that finds IP addresses. The following very similar example uses a regular expression to match URLs:

(http|https|ftp)\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?([a-zA-Z
0-9\-\._\?\,\'/\\\+&%\$#\=~])*

This regular expression requires the URL to start out with http://, https://, or ftp://. You can relax these rules a bit and ignore the protocol (such as http) using the following pattern:

[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9\-\._\?\,\'/\\\+&%\
$#\=~])*

Then you could combine this pattern with the previously used IP address example, allowing you to browse to IP addresses as well as domains. Here is the complete XML definition file for that example:

<?xml version="1.0" encoding="UTF-16"?>
<FL:smarttaglist xmlns:FL="urn:schemas
-microsoft-com:smarttags:list">
   <FL:name>IP Addresses</FL:name>
   <FL:lcid>1033</FL:lcid>
   <FL:description>Recognizes IP Addresses and Domains</FL:description>
   <FL:smarttag type="urn:schemas-manualtags#ipaddresses">
      <FL:caption>IP Address/ Domain (MOSTL)</FL:caption>
      <FL:re>
         <FL:exp>(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])\.
          (\d{1,2}|1\d\d|2[0-4]\d|25[0-5])\.(\d{1,2}|1\d\d|2[0-4]\d|25[0-  
          5])\.(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])</FL:exp>
         <FL:exp>[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?([a-zA-
          Z0-9\-\._\?\,\'/\\\+&amp;%\$#\=~])*</FL:exp>
      </FL:re>
      <FL:actions>
         <FL:action id="ip">
            <FL:caption>Browse to Address</FL:caption>
            <FL:url>http://{TEXT}</FL:url>
         </FL:action>
         <FL:action id="ips">
            <FL:caption>Browse to secure Address</FL:caption>
            <FL:url>https://{TEXT}</FL:url>
         </FL:action>
      </FL:actions>
   </FL:smarttag>
</FL:smarttaglist>

Note   The ampersand (&) was replaced with &amp; in this instance, in order to form valid XML.

A logical follow-up for this example is a pattern that finds e-mail addresses. Here's the regular expression to do it:

 ([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\
.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)

Once again, the following example shows the complete definition file:

<?xml version="1.0" encoding="UTF-16"?>
<FL:smarttaglist xmlns:FL=“urn:schemas-microsoft-com:smarttags:list”>
   <FL:name>E-mail Addresses</FL:name>
   <FL:lcid>1033</FL:lcid>
   <FL:description>Recognizes E-mail Addresses</FL:description>
   <FL:smarttag type="urn:schemas-manualtags#email">
      <FL:caption>E-mail Address (MOSTL)</FL:caption>
      <FL:re>
         <FL:exp>([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-
         9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-
         9]{1,3})(\]?)</FL:exp>
      </FL:re>
      <FL:actions>
         <FL:action id="ip">
            <FL:caption>Send E-Mail</FL:caption>
            <FL:url>mailto:{TEXT}</FL:url>
         </FL:action>
      </FL:actions>
   </FL:smarttag>
</FL:smarttaglist>

The URL simply links us to a mailto:. . . address that automatically launches the default e-mail client.

Here is an interesting regular expression. You can match phone numbers and find an address, a map, and driving directions. This is a pattern to find a US phone number:

((\(\d{3}\) ?)|(\d{3}-))?\d{3}-\d{4}

And the following example shows the matching XML definition:

<?xml version="1.0" encoding="UTF-16"?>
<FL:smarttaglist xmlns:FL=“urn:schemas-microsoft-com:smarttags:list”>
   <FL:name>Phone Number</FL:name>
   <FL:lcid>1033</FL:lcid>
   <FL:description>Recognizes Phone Numbers</FL:description>
   <FL:smarttag type="urn:schemas-manualtags#phone">
      <FL:caption>Phone Number (MOSTL)</FL:caption>
      <FL:re>
         <FL:exp>((\(\d{3}\) ?)|(\d{3}-))?\d{3}-\d{4}</FL:exp>
      </FL:re>
      <FL:actions>
         <FL:action id="phone">
            <FL:caption>Find Address and Map</FL:caption>
            <FL:url>http://www.google.com/search?hl=en&amp;ie=UTF-
            8&amp;oe=UTF-8&amp;q={TEXT}&amp;btnG=Google+Search</FL:url>
         </FL:action>
      </FL:actions>
   </FL:smarttag>
</FL:smarttaglist>

In this example, you can use Google's search service to find additional information about the phone number, but you can substitute other popular search engines that provide similar functionality.

All these examples are rather generic, but you can also imagine more individualized recognizers. If you are a developer, you are probably using some kind of bug or anomaly tracking system to keep track of issues that need to be taken care of. In some organizations, bugs are often discussed in e-mails and Word documents. It would be straightforward to find those references, as they are typically written as Bug #123456 or bug 123456. Here's a pattern that matches these phrases.

[B|b]ug\s(#\s){0,1}\d{6}

By now you can probably imagine the complete smart tag definition:

<?xml version="1.0" encoding="UTF-16"?>
<FL:smarttaglist xmlns:FL=“urn:schemas-microsoft-com:smarttags:list”>
   <FL:name>Bugs</FL:name>
   <FL:lcid>1033</FL:lcid>
   <FL:description>Recognizes references to bugs</FL:description>
   <FL:smarttag type="urn:schemas-manualtags#bugs">
      <FL:caption>Bugs (MOSTL)</FL:caption>
      <FL:re>
         <FL:exp>[B|b]ug\s(#\s){0,1}\d{6}</FL:exp>
      </FL:re>
      <FL:actions>
         <FL:action id="bug">
            <FL:caption>Open the Bug Database</FL:caption>
            <FL:url>http://www.example.com/ShowBug.aspx?id={TEXT}</FL:url>
         </FL:action>
      </FL:actions>
   </FL:smarttag>
</FL:smarttaglist>

Note   The URL used in this example is just a placeholder to be replaced with whatever URL leads to your issue-tracking system.

More Sophisticated Actions

In this example, the entire recognized text (such as "Bug #123456") is passed to the URL. This may not be the desired behavior. Perhaps all you really want is to pass the six-digit number. You may not even want to navigate to a URL but perform an entirely different action instead. You can accomplish this with a little bit of custom programming.

Hand-coded smart tags are usually implemented as a combination of a custom smart tag recognizer (the COM API interface is ISmartTagRecognizer), and the associated smart tag actions (the COM API interface is ISmartTagAction). In the scenario above, the recognizer is already taken care of by the regular expression. So all you need to do to provide a true custom action is to add code for a custom set of actions by implementing the ISmartTagAction interface as defined in the smart tags type library. You can use any language that can implement COM interfaces for this task, such as Microsoft Visual Basic 6.0, Microsoft Visual Studio .NET, and Microsoft Visual FoxPro.

The basic idea is to create a COM component that you can register with Microsoft Office 2003 programs. Office 2003 calls that component and invokes methods that you define by using the ISmartTagAction interface. Every time a smart tag recognizer such as the regular expression recognizer recognizes a string, Office 2003 looks for associated actions by querying all registered SmartTagAction objects. Each action object identifies itself as belonging to a certain namespace. Whenever the namespace matches the namespace defined by the recognizer, for example, urn:schemas-manualtags#bugs, Office displays the defined custom actions in the smart tag menu.

The following listing shows a custom implementation using Visual Basic 6.0. To follow this example, create a Visual Basic 6.0 project called "BugsSmartTag" and a class called Actions. Then, add the Microsoft SmartTags 2.0 Type library to your project references. Add the following code into your Actions class:

Implements SmartTagLib.ISmartTagAction
Private Property Get ISmartTagAction_ProgId() As String
    ISmartTagAction_ProgId = "BugsSmartTag.Actions"
End Property

Private Property Get ISmartTagAction_Name(ByVal LocaleID As Long)_
 As String
    ISmartTagAction_Name = "Bug Manager smart tag"
End Property

Private Property Get ISmartTagAction_Desc(ByVal LocaleID As Long)_
 As String
    ISmartTagAction_Desc = "Recognizes bug numbers."
End Property

Private Property Get ISmartTagAction_SmartTagCount() As Long
    ISmartTagAction_SmartTagCount = 1
End Property

Private Property Get ISmartTagAction_SmartTagName _
    (ByVal SmartTagID As Long) As String
    ISmartTagAction_SmartTagName = "urn:schemas-manualtags#bugs"
End Property

Private Property Get ISmartTagAction_SmartTagCaption _
    (ByVal SmartTagID As Long, ByVal LocaleID As Long) As String
    ISmartTagAction_SmartTagCaption = "Bugs"
End Property

Private Property Get ISmartTagAction_VerbCount _
    (ByVal SmartTagName As String) As Long
    ISmartTagAction_VerbCount = 1
End Property

Private Property Get ISmartTagAction_VerbID _
    (ByVal SmartTagName As String, ByVal VerbIndex As Long) As Long
    ISmartTagAction_VerbID = 1
End Property

Private Property Get ISmartTagAction_VerbCaptionFromID _
    (ByVal VerbID As Long, ByVal ApplicationName As String, _
    ByVal LocaleID As Long) As String
    ISmartTagAction_VerbCaptionFromID = "Load Bug"
End Property

Private Property Get ISmartTagAction_VerbNameFromID _
    (ByVal VerbID As Long) As String
    ISmartTagAction_VerbNameFromID = "Load Bug"
End Property

Private Sub ISmartTagAction_InvokeVerb(ByVal VerbID As Long, _
    ByVal ApplicationName As String, ByVal Target As Object, _
    ByVal Properties As SmartTagLib.ISmartTagProperties, _
    ByVal Text As String, ByVal Xml As String)
    ' Put custom code here...
    MsgBox "Loading Bug Number" + Text
End Sub

The most important aspect of this class is the implementation of the read-only property SmartTagName. This property evaluates to urn:schemas-manualtags#bugs which is identical to the namespace definition in the XML recognizer. It provides the link between this Action class and the regular expression recognizer.

Once the SmartTagAction object is linked to the recognizer, things work like they do for all smart tags. Office 2003 first asks how many actions (verbs) you want to provide in the menu for the recognized smart tag. In the example, there is only one as indicated by the VerbCount property, but there could be any number.

Subsequently, Office 2003 queries names and captions for each verb defined. This is done through the VerbCaptionFromID and VerbNameFromID properties. In the example, you simply return the name of the single verb (menu item). If you wanted to make more than one verb available, you must account for that in the properties as shown in this example:

Private Property Get ISmartTagAction_VerbCaptionFromID _
    (ByVal VerbID As Long, ByVal ApplicationName As String, _
    ByVal LocaleID As Long) As String
    If VerbID = 1 Then
        ISmartTagAction_VerbCaptionFromID = "Load Bug"
    Else
        ISmartTagAction_VerbCaptionFromID = "Something Else"
    End If
End Property

Finally, all that is left is to prepare for when a user clicks one of the provided menu items. This is done in the InvokeVerb method. In the current example, all that happens is the display of a simple message box showing the entire recognized string, but at this point, you can take subsequent actions to perform more sophisticated tasks, such as trimming out unwanted text from the identified string. You can use the resulting string to navigate to a Web site in a more controlled fashion, or you can use it to invoke custom behavior, such as launching a complete application. Note also, that if you had more than one verb defined, you once again must to pay attention to the VerbID parameter that is passed to this method.

The new SmartTagAction object is now ready to be used. All that is left to do is register it with Microsoft Office 2003. To do so, you first need to know the GUID that is assigned to this COM component. You can find the GUID for your component in the registry under the HKEY_CLASSES_ROOT\BugsSmartTag.Actions\ClsId node.

**Caution   **Incorrectly editing the registry may severely damage your system. Before making changes to the registry, you should back up any valued data on the computer.

You must add this GUID as a new key. To do this, right-click the Actions node and then click New. Click Key to create the key entry. The target registry node is shown in Figure 6. Paste the GUID for the smart tag component as the name of the newly added node.

Figure 6. Finding the GUID for a smart tag

**Note   **Make sure to close all Office 2003 applications before you make this change to the registry. The new actions are available the next time an Office 2003 application is launched.

Conclusion

Regular expression support makes smart tags much more powerful—and at the same time easier—to create than ever before. Regular expressions are one of the best ways to analyze strings for relatively loosely defined search criteria. While regular expressions were always available in smart tags, the newly added support for smart tag list files provides a simple way for developers, and even power users, to identify meaningful sections and words in Office documents, and to link them to associated actions without ever writing a single line of actual code. The powerful regular expression syntax allows identifying anything from simple strings to phone numbers and even phrases.

About the Author

Markus Egger (megger@eps-software.com) is the President and Chief Software Architect of EPS Software Corp., located in Houston, Texas. He is also the founder of EPS Software Austria, located in Salzburg. Markus concentrates in consulting and development of custom software based on Microsoft technologies. His passion lies with object-oriented technology. He is an international author and speaker, a C# MVP, and the publisher of CoDe Magazine.

© Microsoft Corporation. All rights reserved.