Contextual spelling for French in Office 2010

At the Worldwide Partner Conference in New Orleans on 13 July 2009, we announced the launch of the Office 2010 Technical Preview. This technical preview can now be downloaded by thousands of customers. You can discover the innovations on the Office 2010 blog and watch really cool videos on www.microsoft.com/office2010. My colleague Stefanie Schiller wrote a few words about the proofing tools integrated in this Technical Preview and about some of the improvements we have made, specifically with respect to the English thesaurus.

French-speaking users will also be delighted. We have talked on multiple occasions on this blog about the English and Spanish contextual spellers that we launched in Office 2007 (which also includes such a tool for German). We have improved them all in Office 2010 and we are happy to introduce a brand-new contextual speller for French which, when added to the French spellchecker and grammar checker, will greatly improve our French-speaking users’ proofing experience.

The French contextual speller in Office 2010 will detect a lot more mistakes which went so far unnoticed with the traditional proofing tools. Unlike the grammar checker, which is based upon a syntactic parser, our contextual speller is based upon statistical analyses of very large textual corpora and upon “language models” which enable the program to compare the user’s text with huge lists of sequences of words with their frequencies. Words that exist in the language but are used improperly in a given context can then be flagged. A blue squiggly line will appear under mistakes such as the following ones:

Ils on faim. (on à ont)

Elles son malades. (son à sont)

Quand à moi, j’avoue que je sui fier de lui. (Quand à Quant ; sui à suis)

Si je peu me permettre, dans son fort intérieur, elle pense qu’elle a raison. (peu à peux ; fort à for)

Se test montre que le correcteur ne fonctionne pas trop mal. (Se à Ce)

L’installation de la fosse sceptique a pris plus de temps que prévu. (sceptique à septique)

Il arrive cet après midi.(après midi à après-midi)

Mon frère ma dit qu’il ne viendrait pas. (ma à m’a)

Il y a long temps que je l’aime, jamais je ne l’oublierai… (chanson populaire) (long temps à longtemps)

En temps que client de l’hôtel, vous avez gratuitement accès à l’Internet. (temps à tant)

 

The screenshot below shows this contextual speller in action :

 

 

What is a « contextual speller » ? As you know, the traditional Office spellchecker flags the odd typo (omission of a letter, permutation of two letters, etc.) with a red squiggly line. The grammar checker deals with agreement mistakes (such as between a verb and its subject, or agreement in number and gender between a noun and an adjective in French). Mistakes related to words that are pronounced similarly but are spelled differently are very hard to detect, however. Anyone who knows a bit of French knows how frequently people (native and non-native speakers alike) mix up similarly-sounding words like son/sont or on/ont. If I write “Ils on faim” (they are hungry), a grammar checker based upon a syntactic parser has difficulty detecting the mistake (“on” should read “ont”) because the erroneous sentence is made up of a pronoun (Ils), followed by another pronoun (on, instead of the correct verb form ont) and a noun (faim). It is hard to make sense of this structure, since it is not a traditional agreement problem as in “Ils mange du pain” (they eat bread), where “mange” (a singular verb) should read “mangent” (plural form).

Of course, you should not expect this tool to be able to flag any kind of mistake. No existing tool is able to do that and those that would be able to do so would probably create a lot of false flags, or false positives, which tend to irritate the user. I discussed the notion of precision and recall when I blogged about an academic evaluation of our Office 2007 contextual speller. When we developed our tool, we constantly tried to avoid false flags and our tool has a very high precision, which means it rarely makes mistakes when it flags something (in fact, it is right nearly all the time, but there will of course always be mistakes that will not be detected). I tend to feel that this new contextual speller will quickly prove to be an indispensible tool for many an Office 2010 user who writes documents in French. It will certainly very usefully complement the range of proofing tools we make available to them.

Thierry Fontenelle

Microsoft Natural Language Group – Program Manager