Is this possible in Word / Office or can it be developed by a third party?
I am very new to MS Word/Office coming from a scientific background where mostly Linux is used. I am working on machine learning and natural language processing.
I would be interested if something like the following is possible or could be implemented in principle in MS word and hopefully pointers for where to get started to learn about how to approach the problem:
- Show contiguous spans of text with some specific background colour, such that overlapping spans get a "mixed" colour. For example, for the text "this is some text", if "This is some" has colour blue associated and "is some text" colour yellow, the text "is some" should get shown in green.
- show additional information about those spans of text when clicked or when selected from an additional entry in a context-menu. For example when the user clicks the "This is some" text span or invokes a context menu option for it, it should be possible to pop up a dialog window that shows a list of arbitrary key/value pairs associated with that span (e.g. "edited on: 2020-02-01; source: ...; ...") So this relates to two questions: is it possible for associated a span of text with formatting and user-specific, arbitrary data and is it possible to add functionality to show that data (and possible update it)?
Background for that question: in natural language processing, algorithms can automatically detect spans in text which are named entities, refer to objects of interest, addresses or the like. So if I have some algorithm outside of word which can detect those things, what is the easiest way to get information about the detect spans into Word and allow a Word user to show them and possibly interact with them?
From a developer's point of view, I guess this comes down to the following two questions:
- is it possible to embed user-specific information about text-spans and formatting information about those spans in the file format for a word document?
- is it possible to develop some kind of add-on inside word to access that information and display the information associated with spans in a user friendly way?
If anybody knows about tools that already do something like this, or even better, knows how one would approach this, it would be very much appreciated.
This question is mainly to figure out if it is worth investigating this further at all and getting a rough impression about the effort this would be.
PS: this is related to research and academic use so anything implemented would be open source but it also means that commercial, non-free solutions to this problem are not relevant and cannot be considered.
For some of the file format side of this question, it sounds like you might be interested in using customXml parts and possibly SDT (structured document tags)in your documents. However, for the overlapping color question, there likely isn't one element or attribute that covers this and in general your question falls on the side of a "how-to" or design advice kind of question which is more suited to a site like StackOverflow (openxml). There you can get opinions and experienced advice on different approaches. But you can read about customXml parts in ISO 29500-1 sections 17.5 and 23.
I hope this helps.
Sr Escalation Engineer
Microsoft Open Specifications
Sign in to comment