Business AI Tops the SQuAD Leader Board

Machine Reading Comprehension (MRC) is a relatively new part of natural language processing (NLP) and is an important part of AI that researchers around the world are working on. The MRC research community has been collaborating around several recently proposed datasets, e.g., the Stanford Question Answer Dataset (SQuAD) and the CNN/Daily Mail QA (CNN-DM) which was released by Google's DeepMind. The dataset generating the most activity in the research community since it was unveiled in 2016 is SQuAD. If you are not familiar with SQuAD, according to their website it is the "reading comprehension dataset, consisting of questions posed by crowd workers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage." One of the features of SQuAD is that they maintain a leaderboard where we can track the rapid progress that is being made in MRC and see how our latest models are doing relative to the state of the art.

We recently tested the approach we are using within Business AI for creating our customer care solution on SQuAD and are happily on top of the leader board as of this writing which is exciting, but we also know that the top spot changes frequently. What is most fulfilling for the team is that within Microsoft's Business AI Solutions group, we have a science team that consists of some of the world's best NLP/MRC researchers. This team is unique in that most of the team is made up of AI researchers from Microsoft Research (MSR). Our science team works closely with the product group to produce applied research that ships as part of the Dynamics 365 AI solutions.

Advances in MRC that Microsoft has made in the last year are listed below.

  • September 2016 the Maluuba team introduced the EpiReader which used an attention sum reader. Attention in MRC corresponds to the similarity between a query vector and the potential answer vectors.
  • March 2017 our Business AI team introduced ReasoNet which added multi-step inference via reinforcement learning. Like how a human would try to answer a question, the neural network goes over the query, documents and answers several times until it concludes that it is ready to answer the question. Determining the number of turns to take is dynamic.
  • July 2017 MSR Asia introduced R-NET which answers questions about a passage using self-matching. First a recurrent neural network is used to get the question aware portions of the passage. Then a self-matching attention mechanism is used refine the representation which leads to pointers to the answer.
  • September 2017 our Business AI team introduced FusionNet which is currently the top single/ensemble MRC model in the SQuAD leaderboard. A paper will be coming explaining the details of this model.

You can see the different positions we hold on the leader board based on our previous research.

For FusionNet we trained our models using the Adversary SQuAD data set. Adversary SQuAD adds a confusing or misleading sentence to the paragraph. It is easier for machines to remember than to reason, so by including adversarial data we get to test our model on how well it would reason over data more closely related to what we would see in our AI solutions. This is part of the reason why it is fun for researchers to be embedded into a product team. Innovations that have been used against the SQuAD dataset are also used in our customer care solution, although the context is different. SQuAD is great at finding factoids while we use it to do multi-turn question and answer over structured database and unstructured data. SQuAD tasks search for an answer in a given passage where no external knowledge is needed, and matching can often work. Business AI tasks require complex reasoning. Tasks deal with a wide variety of query/answer types existing over a given corpus including having to make inferences across sentences and passages. Also, due to the multi-turn dialogs in our virtual agents, being able to leverage external knowledge to get to the right answer is important.

It is still early for AI and we know there will be a lot more changes needed to keep us on top of the leader board. We will continue to work closely with the research community to make advances in MRC and use the innovation in AI to augment human ingenuity.