Microsoft Web N-Gram
Bringing you web-scale language model data. Web N-Gram is joint project between Microsoft Bing and Microsoft Research.
Microsoft Research Speller Challenge is open for business
After a few bumps here and there we have the site up and running. If you prefer a write-up by a...
Date: 01/20/2011
The dirty secret about large-vocabulary hashes
The first step in the n-gram probability lookup process is to covert the input into tokens as...
Date: 12/27/2010
Did you mean...Schwarzenegger?
You launch your favorite search engine and enter a term or two. Then near the top, the search engine...
Date: 12/15/2010
Well, do ya, P()?
Today we'll do a refresher on unigrams and the role of the P(<UNK>). As you recall, for...
Date: 12/13/2010
Perf tips for using the N-Gram service with WCF
The support in Visual Studio for WCF makes writing a SOAP/XML application for the Web N-Gram service...
Date: 12/06/2010
The messy business of tokenization
So what exactly is a word, in the context of our N-Gram service? The devil, it is said, is in the...
Date: 11/29/2010
Wordbreakingisacinchwithdata
For the task of word-breaking, many different approaches exist. Today we're writing about a purely...
Date: 11/22/2010
The fluid language of the Web
We prepared, as we had for the earlier dataset, the top-100K words list for the body stream for...
Date: 11/15/2010
Using the MicrosoftNgram Python Module
Over the past few posts I've shown some samples of the MicrosoftNgram Python module. Writing...
Date: 11/08/2010
Who doesn't like models?
If there ever was an overloaded term in Computer Science, it's models. For instance, my colleagues...
Date: 11/01/2010
UPDATE: Serving New Models
Today's post was delayed slightly but we have good news — announcing the availability of...
Date: 10/25/2010
Generative-Mode API
In previous posts I wrote how the Web N-Gram service answers the question: what is the probability...
Date: 10/18/2010
Language Modeling 102
In last week's post, we covered the basics of conditional probabilities in language modeling. Let's...
Date: 10/11/2010
Language Modeling 101
The Microsoft Web N-Gram service, at its core, is a data service that returns conditional...
Date: 10/04/2010
What can data do for you?
Let's think of the scale of different lexicons, in terms of order of magnitude: 1,000 - the...
Date: 09/27/2010