Languages, regions, locales, oh my!

Michael Kaplan has blogged quite a bit about our release of the Microsoft Locale Builder Beta, the final version of which is planned for use on Vista when it ships. We're pretty excited about the Locale Builder here for a whole bunch of reasons. Like The Microsoft Keyboard Layout Creator before it, the Locale Builder makes it possible for customers to create and share solutions faster and to ensure that the support available in Windows (support for formatting times, dates, currencies, and text, in the case of the Locale Builder) provides the best possible user experience across Windows applications. Straight up, you speak your language better than we do.

One of the pieces that I have been involved with since joining Microsoft is the investigation and maintainance of the NLS locale data that are used in Windows and .NET framework locales. Collecting data in an accurate, useful way poses many challenges. A whole array of things can impact language and technology standards in a region, including many factors whose rapid rate of change can make things interesting. Some things we need to keep in mind:

Changes in politics. Governments change at every scale. New leaders are elected; new representatives are appointed; new curricula are developed for school systems; official language legislation is passed and then revoked and then passed again.

Changes in standards. As technology evolves, new standards get created. As standards organizations change, new standards get created.

Changes in language or language use. Some languages are written in multiple writing systems. Vocabulary is created. A change in one part of the linguistic system can have ripple effects everywhere else in the system.

People move around. Just in my building, I work with native speakers of Spanish, French, Hebrew, Arabic, Chinese, Catalan, Hindi, Kannada, German, Portguese, Italian, Russian, Ukrainian, Japanese, Czech, Persian, and that's just off the top of my head. But they all work here, in Redmond. As people move around more and more, the usual pairings of language and region information inside a locale become less and less relevant. It's just not possible to predict the particular combinations that any individual user will want.

Even in a static universe, one size rarely fits all. I'm an en-US user, but I like a 24-hour clock. I'm weird like that.

The fact is that correct NLS data can be a moving target, and we're hoping that the Locale Builder can help address this for our customers. We're giving users the ability to change the locale data that we've shipped, not only for their own machines, but also for others, since they'll be able to share custom locales that they create. We're also giving users the ability to add new locales entirely, since we recognize that we provide just a small set of the possible languages and regions that our customers care about.

One thing I'm really interested in learning is which of these options will be more popular for our customers. Are people going to use the Locale Builder to modify existing Windows locales? In which case, which locales are people changing, and which pieces of data are they changing within those locales? Or are people going to use the tool to create new locales entirely? It's important to me that we understand what we're getting wrong today and what users feel that they need to fix.