Is this possible in Word / Office or can it be developed by a third party?

Johann Petrak 1 Reputation point
2021-02-23T10:07:34.943+00:00

I am very new to MS Word/Office coming from a scientific background where mostly Linux is used. I am working on machine learning and natural language processing.

I would be interested if something like the following is possible or could be implemented in principle in MS word and hopefully pointers for where to get started to learn about how to approach the problem:

  • Show contiguous spans of text with some specific background colour, such that overlapping spans get a "mixed" colour. For example, for the text "this is some text", if "This is some" has colour blue associated and "is some text" colour yellow, the text "is some" should get shown in green.
  • show additional information about those spans of text when clicked or when selected from an additional entry in a context-menu. For example when the user clicks the "This is some" text span or invokes a context menu option for it, it should be possible to pop up a dialog window that shows a list of arbitrary key/value pairs associated with that span (e.g. "edited on: 2020-02-01; source: ...; ...") So this relates to two questions: is it possible for associated a span of text with formatting and user-specific, arbitrary data and is it possible to add functionality to show that data (and possible update it)?

Background for that question: in natural language processing, algorithms can automatically detect spans in text which are named entities, refer to objects of interest, addresses or the like. So if I have some algorithm outside of word which can detect those things, what is the easiest way to get information about the detect spans into Word and allow a Word user to show them and possibly interact with them?

From a developer's point of view, I guess this comes down to the following two questions:

  • is it possible to embed user-specific information about text-spans and formatting information about those spans in the file format for a word document?
  • is it possible to develop some kind of add-on inside word to access that information and display the information associated with spans in a user friendly way?

If anybody knows about tools that already do something like this, or even better, knows how one would approach this, it would be very much appreciated.

This question is mainly to figure out if it is worth investigating this further at all and getting a rough impression about the effort this would be.

PS: this is related to research and academic use so anything implemented would be open source but it also means that commercial, non-free solutions to this problem are not relevant and cannot be considered.

Office Development
Office Development
Office: A suite of Microsoft productivity software that supports common business tasks, including word processing, email, presentations, and data management and analysis.Development: The process of researching, productizing, and refining new or existing technologies.
3,488 questions
Office Open Specifications
Office Open Specifications
Office: A suite of Microsoft productivity software that supports common business tasks, including word processing, email, presentations, and data management and analysis.Open Specifications: Technical documents for protocols, computer languages, standards support, and data portability. The goal with Open Specifications is to help developers open new opportunities to interoperate with Windows, SQL, Office, and SharePoint.
119 questions
{count} votes