WinFS 101: Introducing the New Windows File System
Thomas Rizzo
Microsoft Corporation
March 17, 2004
UPDATE: In spite of what may be stated in this content, "WinFS" is not a feature that will come with the Longhorn operating system. However, "WinFS" will be available on the Windows platform at some future date, which is why this article continues to be provided for your information.
Summary: Tom Rizzo launches his new column with an overview of why the new "Longhorn" storage subsystem (code-named "WinFS") is needed, what WinFS promises to do to help solve our data-overload problems, and what his column promises to deliver in the coming months. (5 printed pages)
Welcome to my new column, The WinFS Files! This column is aimed at helping you, the developer, understand more about the forthcoming technological innovations in the next release of Microsoft Windows code-named "Longhorn," specifically the revolutionary new file system code-named "WinFS." As this column progresses, we'll take a look at the data model, feature-set, and programming paradigms that characterize WinFS. This initial column will give you a broad overview of what is new in WinFS and how you can get started learning about and understanding this new technology.
Why WinFS?
In the technology industry, there is a growing "perfect storm"—a combination of trends and technologies that will allow the next quantum leap in the way you develop and work with your information. This perfect storm is comprised of three forces joining together: hardware advancements, leaps in the amount of digitally born data, and the explosion of schemas and standards in information management.
Hardware Advancements
If we look at hardware advancements over the last few years, we can't help but notice the explosive growth of certain types of hardware. Everyone knows Moore's law—the number of transistors on a chip doubles every 18 months. What a lot of people forget is that network bandwidth and storage technologies are growing at an even faster pace than Moore's law would suggest. Network bandwidth has grown at a furious pace, within corporations as well as among individuals connecting to global networks from their homes. Storage has increased dramatically over the last few decades, on both server and client machines. In 1984, the 10-megabyte hard disk was introduced in IBM PCs. Today, laptops come standard with 60-80 gigabyte hard drives. In the next few years, it's not inconceivable that laptops will have a terabyte or more of storage. With storage growing at this rate, there is also the problem of managing all the data that people create, store, and search everyday.
Digitally Born Data
Most data that people work with today is born digitally. For example, rather than starting to write this article on a piece of paper, I started writing it using my laptop with Microsoft Word. E-mails, electronic faxes, digital media, calendars, Microsoft Office documents, voice mail, and many other types of information are now created and stored electronically. In fact, according to a 2003 University of Berkeley study, in the year 2002 over 5 million terabytes or 5 exabytes of new information was created. Ninety-two percent of that information was stored on magnetic media, mostly client hard disks. Over 400,000 terabytes of e-mail were sent and stored in 2002. Combine the growing raw power of hardware and software and the ability for computers to connect, download, process, and store much more information than before, and you can see how important it is to effectively manage our digital data—both data that we create ourselves and data that we receive from others.
Data Standards and Schematized Data
The explosion of data standards and schematized data is the final piece of the puzzle helping to drive a new way of thinking around information management. Over the last few decades, while moving to digital data, many corporations and industry groups wanted their data to be modeled after their real-world business processes.
In the beginning, computers could not handle the complexities of modeling and automating many business processes. However, with faster computers, better programming languages, and better data technologies, automation of business processes became a way for corporations to streamline their operations, especially with the advent of enterprise resource planning (ERP), customer relationship management (CRM), sales force automation (SFA), and other types of enterprise applications. With these new types of applications, data suddenly became more complex—but at the same time, data had better structure that was more useful for businesses. Rather than storing opaque binary data or simplistic data models, database systems could relate different types of complex data together. For example, ERP systems understand and can model the complexities of general ledgers, human resources, and sales systems. Since standardized schemas described the data in the system, corporations could ask interesting questions about their business and query their systems for that information.
However, working across systems is still a problem in the industry. Not only is it hard to integrate systems within a corporation, but integrating systems across corporations is even harder. This has sparked the growth of enterprise application integration (EAI) vendors who help customers navigate the sea of schematized data across heterogeneous systems. Recognizing this integration problem, the industry has looked to XML and XML Web services as a way to help companies work across multiple systems and multiple organizations. These newer standards help solve data integration problems and are helping to democratize data throughout and across organizations.
What Is WinFS?
To prepare for this perfect storm of technologies, Microsoft has invested heavily in building the next generation of the Windows file system, code-named WinFS. The WinFS product team follows three core tenets in reinventing the Windows file system: Enable people to Find, Relate, and Act on their information. Let's take a look at what each of these principles means and then we can drill into some detail on the technologies that allow WinFS to meet these goals.
Find
According to the marketing and information technology research firm IDC, knowledge workers spend about 15 to 30 percent of their time looking for information. In a typical eight-hour day—yes, I know, who works a typical eight-hour day—that amounts to anywhere from one to two and a half hours looking for information. IDC also estimates that at least 50 percent of Web searches fail. While improving the speed and accuracy of finding information is one of the key goals for WinFS, just being a better search engine is not. Finding information moves beyond just crawling and indexing content. Instead, information today has many semantics, such as defining relationships among pieces of information that WinFS will provide functionality for. Searching may not be the default way that users of WinFS will find the majority of their information.
In addition, the file system technology has not had a major rework in over ten years. New types of data beyond just binary files and simple metadata have appeared, such as multimedia and new forms of communication and collaboration. The current file system does not know how to collect and find information within these new types of data. But we are building WinFS to extend the file system so that it includes these new types of information and in turn provide richer capabilities to work with it.
Relate
Everyone understands the relationships that exist in their data. Software, today, does not do a good job of storing or exploiting those relationships. For example, I know that a particular document was discussed at a particular meeting by a particular person, say George. I know George wrote that document, plus I know of another document George wrote that may be of interest to me. How do I find the other document today? I have to search everywhere—through my e-mail, through my file system, among my favorite spots. With WinFS, data relationships are built into the system, linking together all the different types of data that people work with, including custom application data that you write. Users can traverse these relationships and explore data in richer ways. Furthermore, you can graphically display these related items in your WinFS-aware applications so that users get a better understanding of how their data all fits together, whether that data is stored by your application, someone else's application, or by a built-in Windows program.
Act
So far, I have talked about how you can store all your data in WinFS and create relationships among that data in new and more intuitive ways. However, if that were all you could do with the system, the innovation in WinFS would only go halfway. One key capability people want in any data system is for the system to help them turn their data into useful information that they can act upon. A large amount of a PC user's time throughout the day is spent as a digital clerk for their data. You have to manually sort, filter, categorize, and stack-rank your data. WinFS will provide digital agents that help people move from digital clerkship to becoming digital data decision makers. Removing tedious data activities that can be automated is one of the key ways that WinFS will help with information overload. Only relevant or important information should bubble up to the data user.
For example, you may want to know when an e-mail comes in from an author with a relationship to a particular business document that you are working on that requires an immediate turnaround. However, you're not sitting at your desk; you're in a meeting with only your cell phone. You don't want to miss the e-mail and you need to make sure that you talk with the person as soon as possible. Talking with the person requires setting up a 30-minute phone briefing to hash out any issues in your business document (which is due by the end of the day).
WinFS Rules are a built-in component of the system that allows you to tell the system how to work with, sort, and deliver your data. Using WinFS rules, you could create a rule that works on your data and on your data relationships. Furthermore, WinFS rules can also work with other Windows applications to notify you, for example by sending a page to your cell phone. Finally, WinFS rules could help schedule the phone meeting by looking at your calendar information for free times in the day. From there, the rule could automatically create a meeting at the next available time. The integrated WinFS rules technology allows the data stored in WinFS (or even replicated into WinFS) to become active data. This active data help drive better business decisions by bringing important information to your attention as it immediately enters the system. This technology can help tame information overload by automating many tasks that we do manually today with our data.
Behind the Technology
Now that we've talked about the philosophy behind WinFS, let's talk about some of its core technologies. In this inaugural column, I'll provide a high-level overview of each of the technologies. Over the coming months, this column will drill into each area and show you, the developer, how you can build applications on the new WinFS technologies.
From a technology standpoint, WinFS is made up of five components: Core WinFS, Data Model, Schemas, Services, and APIs. Figure 1 shows a more detailed view of the building blocks in these components.
Figure 1. The core WinFS building blocks
Core WinFS
Core WinFS is made up of the core services that you would expect from a file system. Think of Core WinFS as the fundamentals, which includes operations and file system services. Some examples here are security, manageability, Win32 file access support, import/export, quotas, and so on.
Data Model
Moving beyond the core services, the Data Model provides some of the technical innovations I mentioned earlier, including the basic item structure, relationships, and the ability to extend both items and relationships.
Schemas
Without built-in schemas, WinFS would be no better than the existing file system, since WinFS would not understand your data in richer ways or provide a more structured way to handle your data's metadata. WinFS schemas include schemas for your everyday information such as documents, e-mail, appointments, tasks, media, audio video, and more. WinFS also includes system schemas that include configuration, programs, and other system-related data.
Services
Synchronization and rules fall into the services area of WinFS. These technologies "sit on top" of WinFS to provide you with capabilities that extend beyond the fundamentals of the system. Synchronization will enable you to synchronize WinFS systems across a network, as well as build synchronization adapters to synchronize WinFS to other systems. For example, you may want to synchronize contact information from your CRM system to WinFS so that you can relate that data to other data in WinFS or work with that data offline through WinFS. Synchronization adapters can be bi-directional, so any changes made to the data in WinFS can be synchronized back to the other partner system.
APIs
As a developer, you write to APIs. WinFS includes a rich API that is part of the overall WinFX programming model in Longhorn. Through the WinFS API, you can program the different building blocks of the WinFS system including data operations, rules, synchronization, and the data model.
Conclusion
There is a lot to talk about over the coming months as we start drilling into the different areas of WinFS. To get started, you will want to get a firm understanding of the other Longhorn pillars, especially "Avalon," since many of the applications that we will be building in the coming year will be Avalon-based. I recommend that you take a look at Chris Sells' overview of the Longhorn pillars in his first Longhorn Foghorn column. Beyond that, get ready for WinFS: It's the future of information-driven applications.
The WinFS Files
Thomas Rizzo is a director in the Microsoft SQL Server group. In his spare time, Tom writes books on programming for Microsoft Press, helps customers on the Microsoft newsgroups, and occasionally updates his blog (which he should do more often!). You can reach Tom at thomriz@microsoft.com.