Over the years a number of threads have discussed how to extract a list of the abbreviations used in a Word document.* The basic technique is fairly straightforward: you use Advanced Find with wildcards and search for a pattern like "[A-Z]{2,}" which matches
strings of two or more upper case letters. A variant on this is ([A-Z]{2,}) which matches strings of two or more upper case letters surrounded by parentheses. These work pretty well and do give you a list of the abbreviations used.
I wondered whether one could go a step further and try to extract the definitions of the abbreviations as well. This is very difficult in general, but might be possible in the (common) case where the abbreviation is first defined and then given in parentheses.
An example: "Responding to these concerns, the International Monetary Fund (IMF) issued a statement that ..."
So the idea would be to match the pattern (IMF) using ([A-Z]{2,}) but also capture the words preceding (IMF). Of course you don't know in general how many words to capture, but in practice most abbreviations are between 2 and 5 characters in length, so
capturing the preceding 5 words might be enough to give you most of the definition.
What I can't figure out is how to construct a pattern which is "any five words" followed by "([A-Z]{2,})"
I've looked in Jack Lyon's Wildcard Cookbook for Microsoft Word (a wonderful book, by the way) and tried things like "([A-z]@) ([A-Z,]{2,})" but that just sends Word into what seems like an endless loop and eventually Word stops responding and has to be
closed down.
I've run out of ideas of how to word this pattern. Suggestions welcome.
Stephen Yeo
* e.g.
https://answers.microsoft.com/en-us/msoffice/forum/all/is-there-a-way-to-search-your-document-for/7d94ea8d-2be8-488d-be03-f0b5a8043f87