This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.
This content is no longer actively maintained. It is provided as is, for anyone who may still be using these technologies, with no warranties or claims of accuracy with regard to the most recent product version or service release.
Document Management and Much More
By Tom Rizzo
You may have already heard something about Microsoft's new product, now code-named Tahoe. Tahoe is designed to make generating and managing information easier, while providing a rich development environment for collaborative applications. In this article, we'll look at what Tahoe is, and is not, its features, and how a developer can build rich collaborative applications on the Tahoe platform.
When we set out to build Tahoe, we had a number of clear goals in mind provided by the needs of our customers and partners. Customers tell us that it's difficult to locate information. With the explosion of intranets, file shares, and the Internet, information is plentiful, but finding the desired information can be difficult. Novice end users and experienced knowledge workers alike, find it hard to organize and manage information. Many customers tell us that their current knowledge systems are either tricky to set up or difficult to manage. We also had a particular request for a departmental collaboration server that didn't call for the stringent infrastructure requirements of Microsoft Exchange 2000.
Microsoft Tahoe meets these goals by providing a simple and powerful way to access corporate and Internet information. Tahoe also provides a mainstream out-of-the-box document management system that integrates with the tools knowledge workers use to create information, including Microsoft Office, Web browsers, and the new Web Folders in Microsoft Office 2000. Finally, Tahoe ships a departmental version of the Web Storage System from Microsoft Exchange 2000 that allows developers to build solutions that can be rolled out departmentally without requiring modifications to the Active Directory.
Moreover, Tahoe doesn't require the Active Directory to be deployed. Tahoe can also work with Windows NT 4.0 domains. Tahoe provides all the collaborative power of the Web Storage System without incurring the infrastructural overhead of rolling out Active Directory across an entire organization. There are some limitations (explained at the end of this article) if Active Directory isn't employed. If those limitations are a concern, you can always build a solution on Exchange, and then integrate Tahoe features to get around them. We will.
Document Management Overview
The foremost feature of Microsoft Tahoe is that it provides out-of-the-box document management. Yet Tahoe targets a different market than most document management systems. Tahoe has 80 to 90 percent of the functionality of other document management systems. Where Tahoe has the edge is that most knowledge workers don't need high-end document management to effectively manage information. Rather than erring on the side of requiring complex infrastructure and burdening client-side requirements, Tahoe is easy to use and deploy. You won't see high-end features such as replicated document management with replicated document lock management. If you need this type of high-end functionality, work with one of the Microsoft partners that provides these capabilities.
Let's see how Tahoe simplifies document management by examining the features of its document management system, including check-in/check-out, linear versioning, document profiling, role-based security, and publishing/approval routes.
Check-in/check-out. Tahoe allows users to check-out documents, effectively disabling other users from editing those documents. When users are done with a document, they can check the document back in.
Linear versioning. Tahoe also provides linear versioning support. When a user checks a document out, Tahoe will "version" that document so the user can roll back to a previous version. This provides a backup in case the changes that were incorporated into a later version are no longer wanted. This version history is maintained through check-in and check-out. When a user checks a document back in, a "dot version" such as version 1.1, is created. When a document is marked as published, a major version is made. For example, if you publish version 1.1 of your document, the published version is 2.0. The user can go back to a previous document version at any time. FIGURE 1 shows the document version interface.
FIGURE 1: Tahoe document version interface.
Document profiling. Tahoe lets users fill out document profiles, which are the metadata for the document. Examples of document profile properties can include author, category, or a customized property. There can be multiple document profiles. Knowledge coordinators can set up these profiles and make them mandatory for document check-in. FIGURE 2 shows the interface for setting the document profiles available in a folder. FIGURE 3 shows filling in a document profile when checking in a Tahoe document through Office.
FIGURE 2: The interface for setting document profiles in a folder.
FIGURE 3: Filling in a document profile when checking in a Tahoe document.
Role-based security. The role-based security in Tahoe simplifies administration. Instead of displaying the familiar ACL editor user interface to end users, Tahoe exposes a role-based security system in which users can fall into three primary roles: Reader, Author, and Coordinator. A Reader only has permissions to read the documents in the folder. An Author can create and edit documents. A Coordinator can read, write, edit, set permissions, and set document profiles for the folder.
Beyond folder-level access, Tahoe provides an easy way for users to deny read access to users at the document level as well. These roles are mapped back to Windows security settings by the Tahoe system. FIGURE 4 shows the user interface for setting role-based security.
FIGURE 4: User interface for setting role-based security.
Publishing and approval routing capabilities. Publishing turns the document into a major version. If approval is set up, publishing will trigger an approval route before officially publishing the document. The approval route can be a serial route to approvers, or a parallel route sending the document to all approvers at once. The parallel route has the option of final document approval only if all involved approve, or if at least one approves. FIGURE 5 shows the interface for setting up document routing and approval.
FIGURE 5: The interface for setting up document routing and approval.
Ubiquitous Client Access
Tahoe simplifies document management by integrating with products people know and use. Office 2000 is a key client of Microsoft Tahoe. Tahoe extends Office 2000 menus with document management options such as check-in, check-out, and publishing. With these extensions, you can take advantage of the document management features of Tahoe directly from Office.
Tahoe also integrates with the Web Folders feature of Microsoft Office and Microsoft Windows. Through the Web Folders interface, which eventually uses WebDAV (Web-based Distributed Authoring and Versioning; a set of HTTP extensions) to communicate with the server, you can browse and perform operations on your Tahoe information.
Tahoe extends the Windows Explorer to provide richer views and semantics for the document management capabilities. For example, you can see a Tahoe view in the Windows Explorer, shown in FIGURE 6. This provides information about the document on the left, while the list view on the right has the custom document profile information. This extension and integration with Windows Explorer makes it easier for users to find the information they need.
FIGURE 6: Viewing Tahoe in Windows Explorer.
The final way you can access Tahoe is through standard Web browsers. Tahoe provides an out-of-the-box portal site which can incorporate not only Tahoe data, but also data from business applications. The Tahoe portal is built using Web Parts and the new Digital Dashboard 2.0 framework. (Web Parts are reusable components that wrap Web-based content such as XML, HTML, and scripts with a standard property schema that controls how the Web Parts are rendered in a digital dashboard. The Web Part SDK is available at: http://www.microsoft.com/DirectAccess/Products/SBS/CRK/files/digital_dashboard/CD/webparts.htm.)
This means that any of the Web Parts you build can be integrated into the Tahoe portal. Tahoe exposes its functionalities, such as search and subscriptions, as Web Parts. You can take these Web Parts and integrate them into the dashboards that you build. All of the document management features of Tahoe are displayed through standard browsers, such as Internet Explorer and Netscape Navigator. By integrating Digital Dashboard, the Tahoe portal provides an extensible platform (via Web Parts) for you as a developer. FIGURE 7 shows the Tahoe portal.
FIGURE 7: The Tahoe portal.
Extensive Search Features
Beyond document management and portal capabilities, Tahoe provides rich content indexing and search capabilities as well. Tahoe can "crawl" Office documents, intranet and Internet Web sites, file shares, Exchange 5.5 and 2000 servers, other Tahoe servers, and Lotus Notes servers. Unlike the content indexing included directly in Exchange 2000 which can only support Exchange 2000 data sources in a single index, Tahoe searches can crawl these multiple data sources and store the results in a single index. This means you can search all of these data sources with a single query. And Tahoe provides integrated security so users don't see unnecessary results.
Tahoe also provides a saved-query search engine. This engine matches documents against saved queries as documents are indexed. This makes for a faster search since Tahoe is smart about matching the documents to queries that users have saved. This allows users to subscribe to a query, e.g. whenever Tom Rizzo is the author, notify me. Then when a new document meets that criteria, the user will be notified by the engine via portal or email.
Developers can take advantage of the content indexing through the familiarity of standard APIs. For example, if content indexing is enabled and you perform an ADO query against a source that is content-indexed, Tahoe will leverage that index to provide the best results. Through queries, the relevance ranking of the returned result sets can be changed based on valuing attributes specified in the query. For example, if you did a search on Web Storage System against all the documents in a Tahoe index, you may want to rank documents that also include Exchange 2000 as more relevant. With Tahoe search, you can provide these capabilities in your applications.
Tahoe search is enterprise-ready. You can deploy multiple indexing or search servers across the organization to offload the indexing and user queries to dedicated boxes. Tahoe can support full and incremental crawls of data. For some data sources such as Tahoe data sources, you can set up a document-change notification for the indexing engine, so a document is indexed immediately if it changes.
Departmental Web Storage System
One of the great things about Tahoe for developers is that the product is built on the Web Storage System. For those of you who have not heard of the Web Storage System, this is the enhanced collaborative technology in Exchange 2000, which added ADO/OLEDB support, WebDAV, Installable File System, server events, and workflow to Exchange. (For a detailed introduction to the Web Storage System, see "Exchange 2000 & Web Storage System: Getting to Your Semi-structured Data" by Alex Gomez, http://www.officevba.com/features/2000/10/vba200010ag\_f/vba200010ag\_f.asp.)
Since Tahoe builds on this technology, all of this support is included in the product. And since Tahoe doesn't require the infrastructure needed for Exchange, Web Storage System solutions can be deployed departmentally, or in enterprises that doesn't have an Exchange environment. This is a great win for developers who want to build rich collaborative applications, but don't want to install Exchange 2000.
Tahoe ships with, and extends the familiar Microsoft Exchange object models such as Collaboration Data Objects (CDO) for Exchange 2000. Tahoe adds document management capabilities to CDO such as check-in/check-out, version control, and easier schema manipulation.
Tahoe does have some limitations, the main ones having to do with document management. As mentioned earlier, Tahoe isn't targeted at companies with high-end document management needs; it's intended users are departmental or small organization customers. This doesn't mean you can't use multiple Tahoe servers in a large organization to meet its document management needs. Search and content indexing - as well as the portal - are enterprise-ready.
Whew! That was just a quick overview of the capabilities of Microsoft Tahoe. Tahoe does have other capabilities that are beyond the scope of this article. To learn more about Tahoe, visit the Microsoft Web site. As Tahoe does not yet have an official name, there is also no Web site. Keep an eye on the Microsoft Web site at http://www.microsoft.com for updates, and download the beta of Tahoe now to give it a try. I guarantee you will like what you see.
Tom Rizzo is a product manager at Microsoft. He focuses on helping developers understand and leverage the new Web Storage System. Tom is also the author of two Microsoft Press books Programming Microsoft Outlook and Exchange, 1st and 2nd editions (March 1999, June 2000). You can reach Tom at email@example.com.