Integrating External Document Repositories with SharePoint Server 2007

Summary: Learn to design, implement, and deploy a complete solution for integrating an external document repository with Microsoft Office SharePoint Server 2007. External repositories can participate in workflows and use task lists and extensible metadata. (15 printed pages)

Trent Swanson, Microsoft Corporation

Bhushan Nene, Microsoft Corporation

Scot Hillier, SharePoint MVP

February 2009

Applies to: Microsoft Office SharePoint Server 2007, Windows SharePoint Services 3.0

Download the code sample that accompanies this article: Integrating External Content Repository with SharePoint Server 2007


  • Introduction to Unstructured Data Integration with SharePoint Server 2007

  • Scenario

  • Understanding Design and Implementation Details

  • Conclusion

  • Additional Resources

Introduction to Unstructured Data Integration with SharePoint Server 2007

When developing an information strategy, organizations often begin by considering the structured data found in line-of-business (LOB) systems. Structured data, however, represents less than one third of the total data in an organization. The vast majority of data lives in unstructured documents such as proposals, purchase orders, invoices, employee reviews, and reports. In the enterprise, documents are stored in many different repositories including enterprise content management applications, enterprise resource planning systems, customer relationship management systems, product life-cycle management systems, custom LOB applications, and file shares. It can be difficult for information workers who want to use the data contained within these documents to locate the documents and integrate them within their daily work.

Microsoft Office SharePoint Server 2007 provides an excellent platform for storing, retrieving, and using document-based data within the enterprise. Office SharePoint Server 2007 provides document libraries that enable organized storage of documents using content types, metadata, and the retrieval of these documents by searching, sorting, and filtering. Documents can also participate in workflow activities that represent business processes that integrate the information worker with the document data. And Office SharePoint Server 2007 provides an enterprise search capability that can locate documents within a SharePoint server farm or external source.

You can gain significant benefits by migrating documents from external repositories (repositories that are not SharePoint repositories) into Office SharePoint Server 2007, but this process can be time consuming. Many organizations have significant investments in external repositories and want to maintain them. Therefore, a solution is needed that integrates external repositories with Office SharePoint Server 2007 so that organizations can maintain their existing investments in document repositories while taking advantage of the metadata, search, workflow, and collaboration capabilities of Office SharePoint Server 2007.

This article presents a complete architecture and sample code for integrating an external repository with Office SharePoint Server 2007. The solution enables you to view and manage documents in an external repository through the Office SharePoint Server 2007 user interface (UI). While some repository vendors provide Web parts that enable document browsing and viewing, this solution goes far beyond these basic capabilities to provide a complete integration of the repository. Using this architecture, end users can create libraries in Office SharePoint Server 2007 that include documents from an external repository without copying those documents into Office SharePoint Server 2007. Users are also able to maintain the documents by using Office SharePoint Server 2007 capabilities such as check-in, check-out, edit properties, version history, and document upload and download. The solution also enables complete integration of external documents with SharePoint workflows, list columns, task lists, and other libraries.

The architecture presented in this article uses the Content Management Interoperability Services (CMIS) standard for accessing enterprise content management (ECM) systems in a way that is platform-independent. CMIS is a standard developed by Microsoft, the EMC Corporation, and IBM that uses SOAP, Representational State Transfer (REST), and Atom Publishing Protocol to enable communication with and between ECM systems. While the sample is not intended to be a complete example of a CMIS implementation, it can be used as a starting point for such an effort. For more information, download a complete draft CMIS specification: Content Management Interoperability Services. Figure 1 shows a conceptual diagram of CMIS.

Figure 1. Conceptual diagram of CMIS

Conceptual diagram of CMIS


This article uses a scenario to help present the benefits of integrating an external document repository with Office SharePoint Server 2007. In this scenario, the external repository contains financial documents such as invoices and purchase orders. Information workers in the organization create these financial documents through collaboration. Review tasks are assigned, and several versions of the document may be created. Additional metadata—beyond what is available in the repository—may be needed to properly classify the documents. After a document is created, a review and approval process is initiated. Information workers also use search to locate documents that were created and approved in the past. The following sections present a detailed scenario in which the Integrating External Content Repository with SharePoint Server 2007 solution is used to create and approve a purchase order for a laptop.

Creating a New Document Library

The scenario begins with a site owner creating a library. The solution presents a type of library named "External Library". This library is selected from the Create page just like any other list or library. Figure 2 shows the External Library type on the Create page.

Figure 2. Creating an External Library

Creating an External Library

During the creation process of the External Library, the site owner must provide some configuration information that enables the library to connect with the target repository. In this solution, the Repository ID and the Root Object ID must be provided. This information determines which repository is displayed and the root where browsing begins. Figure 3 shows the provisioning page where the Repository ID and Root Object ID are specified. In our scenario, the site owner set up the External Library to reference a folder of purchase orders and invoices beginning at the folder object named "ROOT" (see Figure 3).

Figure 3. Provisioning a new External Library

Provisioning a new External Library

Browsing the Library

After the library is provisioned, users may browse the library. The default library view uses a custom ASPX page with a custom Web Part that hosts a custom Microsoft Silverlight application. This Silverlight application provides an interface that looks similar to the standard SharePoint document library, but surfaces documents from the target repository instead. Figure 4 shows the default view of a new External Library.

Figure 4. Default view of an external repository in SharePoint Server

Default view of an external repository

In our scenario, the site owner wants to add additional metadata to the library beyond what is present in the repository. Specifically, the site owner wants to add a Charge Account field to record the account being used for any purchase.

The site owner can create the field by clicking the List Settings link in the toolbar, which opens the standard Customize page for the library. On this page, the site owner can add fields in the same way as for any SharePoint library. The External Library merges any fields defined in Office SharePoint Server 2007 with the fields coming from the repository, to form a single view of metadata for the document. Figure 5 shows the document properties for a selected document after fields are added. In the figure, the Charge Account field comes from Office SharePoint Server 2007 and all of the other fields are sourced from the repository.

Figure 5. Adding metadata

Adding metadata

Creating the Purchase Order

After the External Library is created and configured, end users are allowed to access it. In our scenario, an end user accesses the library to upload a new purchase order. The user already created the purchase order as a Microsoft Office Word 2007 document, but needs to upload it and start an approval workflow. Using the library toolbar, the end user selects to upload a new document. When the document is uploaded, the end user must specify the content type of the document. Figure 6 shows the document uploaded as a purchase order.

Figure 6. Uploading a new purchase order

Uploading a new purchase order

Assigning a Task

As part of the document preparation, the user wants to get a review from the IT department to ensure that the laptop to be purchased is approved for use with the current infrastructure. The user clicks the document in the library and selects to create a task. The user is prompted for the name of a task list to use and a task is created. The IT department receives an e-mail message notification of the new task, reviews the document for compliance, and completes the task. Figure 7 shows the dialog box for creating a task and Figure 8 shows the created task with a link to the associated document in the repository.

Figure 7. Creating a task

Creating a task

Figure 8. A new assigned task

A new assigned task

Starting a Workflow

After the initial collaboration is complete, the end user needs to take the purchase order through the approval process. The approval process is implemented as a workflow in Office SharePoint Server 2007. The end user can select to start a workflow from the context menu, which then opens the standard SharePoint workflow page. The process runs in the standard way, by assigning review tasks to approvers. Figure 9 shows a workflow task assigned against the new purchase order. After the workflow is complete, the purchase order is archived in the repository as an approved document.

Figure 9. Assigning a workflow task

Assigning a workflow task

Searching for a Document

Sometime after the purchase order is approved, the user receives the laptop. As part of an internal audit process, the original purchase order must be reviewed against the received items. To perform the review, an auditor searches for the original purchase order by using Enterprise Search in Microsoft Office SharePoint Server 2007. In the SharePoint Search Center site, the auditor searches for the purchase order by number and receives results.

Solution Architecture Overview

The sample solution uses the architecture shown in Figure 10. In this architecture, a custom repository uses a portion of the file system to store documents, versions, and metadata. By using a custom repository in the solution, the sample is independent of any particular document management system while showcasing the required integration points with Office SharePoint Server 2007.

The repository used in this sample is not a complete ECM system. The sample is intended to be used for testing. In a production environment, you could replace the sample repository with an ECM system that is accessed through a set of Web services that have the same signatures as the sample. The remaining parts of the sample (Web Parts and protocol handler) would then be able to work with the new repository.

Figure 10. Sample solution architecture

Sample solution architecture

Document Repository and WCF Web Services

The document repository in the sample uses a set of Windows Communication Foundation (WCF) Web services to access documents that are stored on the file system of the host computer. The Web services expose a set of functions that can be used to execute common operations (for example, check-in, check-out, versioning, and delete) on documents in the repository. These Web services support the UI and content indexing for search.

ASMX Web Services

The ASMX Web services are deployed within the SharePoint context. The purpose of the ASMX Web service layer is to support access to the SharePoint Single Sign-On (SSO) service and provide cross-domain support for the Silverlight UI. The SSO service is used to map Windows credentials to repository credentials, which represents the common scenario where a repository uses a custom security system. The ASMX Web services also prevent a situation in which Silverlight is trying to make a cross-domain call directly to the WCF Web services that are supporting the repository. Although such calls are possible, they can introduce security vulnerabilities, so the sample avoids this approach.

Silverlight UI

The Silverlight UI is the primary component supporting user interaction with documents in the repository. The Silverlight UI is delivered through a Web Part that hosts an .xap file (XAP package), which contains the definition of the Silverlight UI. The solution also contains a custom document library, which supports the user by providing access to advanced Office SharePoint Server 2007 capabilities such as workflows, custom columns, and tasks lists.

Protocol Handler

The protocol handler supports the Office SharePoint Server 2007 indexing engine by allowing access to the repository. The protocol handler allows for indexing and subsequent searching of documents in the repository by using the Search Center site.

Understanding Design and Implementation Details

The following sections present key details for each of the major components in the sample solution, and discuss significant challenges, design details, and implementation approaches.

Document Repository

The document repository uses a directory structure on the host server. This structure has a "MyRepository" folder, which contains information about the repository, its content types, and users. Each document in the repository is saved with a version number appended to the title so that multiple versions may be stored and version history can be tracked. Metadata for the document is tracked in an associated .xml file—one file for each document. Figure 11 shows the repository structure for the sample with a document and properties file visible.

Figure 11. The repository file/directory structure

The repository file/directory structure

The MyRepository folder represents the top of the repository structure. The MyRepository folder is created during the installation process under the folder where the repository is installed. The MyRepository folder contains three XML documents that contain key information about the repository: _cmis_repository.xml, _cmis_ctypes.xml, and _cmis_security.xml.

The file _cmis_repository.xml contains information about the repository product, including information such as the name of the repository and the version number. Listing 1 shows the information contained in the file.

Listing 1. _cmis_repository.xml

<?xml version="1.0"?>
<getRepositoryInfoResponse xmlns:xsi="
instance" xmlns:xsd="">
  <repositoryDescription>My Test Repository</repositoryDescription>
  <productName>My Repository</productName>
    <capabilityPWCUpdateable xmlns="">
    <capabilityPWCSearchable xmlns="">
    <capabilityAllVersionsSearchable xmlns=>
    <capabilityQuery xmlns="">
    <capabilityJoin xmlns="">
    <capabilityFullText xmlns="">

The file _cmis_ctypes.xml contains information about the available content types supported by the repository. Whenever a document is uploaded to the repository, the Silverlight UI presents a list of available content types from which the user can select. The content types in the repository should not be confused with SharePoint content types. The sample does not use specialized SharePoint content types. Listing 2 shows the available content types, as defined in the _cmis_ctypes.xml file.

Listing 2. _cmis_ctype.xml

<?xml version="1.0" encoding="utf-8" ?>
  <ContentType Name="Folder">
  <ContentType Name="Invoice">
      <field Name="InvoiceNumber" DisplayName="Invoice Number"
      type="text" />
      <field Name="InvoiceAmount" DisplayName="Invoice Amount"
      type="text" />
      <field Name="InvoiceDate" DisplayName="Invoice Date"
      type="datetime" />
      <field Name="ChargeBack" DisplayName="Charge Back"
      type="boolean" />"
      <field Name="Status" DisplayName="Status" type="choice">
  <ContentType Name="PurchaseOrder">
      <field Name="PONumber" DisplayName="PO Number" type="text" />
      <field Name="POAmount" DisplayName="PO Amount" type="text" />
      <field Name="PODate" DisplayName="Purchase Date"
      type="datetime" />
      <field Name="NewVendor" DisplayName="New Vendor" type="boolean"/>
      <field Name="Status" DisplayName="Status" type="choice">

The file _cmis_security.xml contains security information for the repository. The sample uses a custom security system that authenticates users against the information contained in the _cmis_security.xml file. Listing 3 shows the contents of the _cmis_security.xml file.

Listing 3. _cmis_security.xml

<?xml version="1.0"?>
<credentials xmlns:xsi=

Whenever a user attempts to access an object in the repository, the repository authenticates the user credentials against those contained in the _cmis_security.xml file. Authorization information is stored in an associated property file for each document and maps user names to privileges for the given document. The associated property document also contains the metadata for the document. Listing 4 shows a sample property document containing metadata and rights information.

Listing 4. Sample metadata and rights for a document

<?xml version="1.0" encoding="utf-8" ?> 
<getPropertiesResponse xmlns:xsi="
XMLSchema-instance" xmlns:xsd="">
  <title>New Laptop</title> 
  <createdDate>2/28/2002 12:15:12</createdDate> 
  <modifiedDate>3/15/2004 12:15:12</modifiedDate> 
    <PONumber type="text">456</PONumber> 
    <POAmount type="text">1500.50</POAmount> 
    <PODate type="datetime">8/15/2005 11:30:22</PODate> 
    <NewVendor type="boolean">True</NewVendor> 
    <Status type="text">Draft</Status> 

Accessing the Custom Repository Using WCF Services

The custom repository in the sample is accessed through a set of Windows Communication Foundation (WCF) services. These services implement a portion of the CMIS specification and expose methods for sample operations on the repository. The methods are contained in four services as follows: repository, navigation, object, and versioning. The repository service provides access to functions that apply to the repository as a whole, such as making a connection. The navigation service provides functions for navigating the object hierarchy of the repository to display folders and documents. The object service provides functions for operating on an individual object, such as returning the metadata for an object, uploading, or checking out. The versioning service provides functions for managing the versions of an object.

These services provide strong support for the sample, but they are not fully compliant with the CMIS specification. A compliant implementation includes both SOAP and REST endpoints and implements all of the required repository operations. The sample implements only a subset of the available operations.

Because the repository uses a custom authentication system, the services and the repository are deployed together as a "self-hosted" WCF service. Self-hosting allows the repository to be deployed as an executable file and to use an authentication scheme that is not based on Windows authentication, by passing credentials along with any method call. This approach enables the custom repository to accurately represent commercial document management systems that use custom security schemes. The sample contains an installer for the host and services. After it is installed, the host can be started and the repository is available.

ASMX Web Services

The self-hosted WCF services are accessed through a set of ASMX Web services that are hosted in Office SharePoint Server 2007. The endpoint for the WCF services is stored in the web.config file for the SharePoint Web application where the solution is deployed. The ASMX Web services were created to act as a proxy for the Silverlight UI, to allow access to the SSO service, and to provide a mechanism for integrating Office SharePoint Server 2007 functionality with the repository.

The ASMX Web services are deployed in the SharePoint LAYOUTS directory through the use of a SharePoint solution package (.wsp file). The Silverlight UI can easily locate the endpoints for the ASMX Web services by using the _layouts virtual directory mapped by Office SharePoint Server 2007. Because the Silverlight UI is deployed through a SharePoint Web Part, it should not call to the repository directly because a cross-domain call requires special configuration. A better approach is to have the Silverlight UI call the ASMX Web services, which are in the SharePoint context, and then have the ASMX Web services call the repository. The SharePoint Web Part is deployed to the target site through the use of another solution package.

The SSO service, which is accessed by the ASMX Web services, is used to store the repository credentials mapped to the user's Windows account. Using this architecture, the Silverlight application may call the ASMX Web services by using the Windows account of the current user. The ASMX Web services may then access the SSO service to retrieve the repository credentials for the current user. The repository credentials are then passed along with the WCF service call to the repository to perform the desired operation. The SSO service is accessed through a custom class that makes use of the functionality contained in the Microsoft.SharePoint.Portal.SingleSignon namespace. Listing 5 shows the code for the SSO class.

Office SharePoint Server 2007 does not normally have the SSO service running, so the service must be started and configured for the sample to run. Additionally, an SSO application must be defined to map the Windows credentials to repository credentials. The application name is stored in the web.config file so that the ASMX Web services can easily access it.

Listing 5. Accessing the SSO service

public class SSO
  private string username;
  private string password;
  private string statusMessage;

  public SSO() { }

  public SSO(string ApplicationName)
    string[] credentials = null;
    Credentials.GetCredentials(1, ApplicationName, ref credentials);
    username = credentials[0];
    password = credentials[1];
    statusMessage = "Success";

  public string Username
  { get { return username; }}

  public string Password
  { get { return password; }}

  public string StatusMessage
  { get { return statusMessage; }}

In addition to accessing the repository credentials stored in the SSO service, the ASMX Web services also offer the opportunity to integrate SharePoint information with the repository information returned from the WCF service calls. For example, the ASMX service can merge repository metadata with SharePoint metadata for any object returned from the repository. In this way, SharePoint users may create columns for items in the repository by using standard SharePoint facilities. These columns are then stored in Office SharePoint Server 2007 separately; they do not affect the repository itself. Additionally, the ASMX Web services may access SharePoint workflow information, tasks, and other libraries to support complete integration of the external document with Office SharePoint Server 2007 capabilities.

Custom Document Library

A custom document library feature was created in the sample to support a more seamless integration experience. Initially, someone might question the creation of a separate document library feature because the documents should be stored in the external repository. Creating a custom document library feature, however, enables many of the key integration points in Office SharePoint Server 2007 and makes the user experience similar to the experience with a standard document library. For example, creating a custom document library feature allows external libraries to be created from the Create page, as shown previously in Figure 2. This, however, is just the starting point for integrating Office SharePoint Server 2007 and the repository.

In the custom document library feature, all of the standard views are removed and replaced by a single view that contains the Silverlight application. This means that a standard Office SharePoint Server 2007 document library is created by the feature, but it is inaccessible to the user because the only view supported by the library displays the Silverlight application, which renders only documents from the repository. The hidden document library, however, gives us the ability to more fully integrate with Office SharePoint Server 2007, as described in the following sections.

Creating Custom Metadata

In the solution, metadata is displayed in the Silverlight application. Initially, this metadata is based on the content type contained in the repository. However, end users can click List settings on the toolbar to access the standard Customize page for the inaccessible document library. On the Customize page, end users can define new columns for the library in the standard way.

During document retrieval, the ASMX Web services check to see whether additional custom metadata has been defined for the hidden document library. If there is additional metadata, the ASMX Web services create a small shortcut file and load it into the hidden document library. The small shortcut file then enables specific values to be entered into the custom columns and associated with the repository document. The shortcut file is created only for documents that are viewed by the user, so the number and size of these files is kept small. The key point here is that only the additional metadata is stored in Office SharePoint Server 2007. The original file and the original metadata are still stored only in the repository.

Integrating a Repository Document with a Workflow

Users may choose to start a workflow on a repository document by selecting Workflows from the context menu. When the Workflows item is selected, the solution checks out the document from the repository and copies it to the hidden document library. After the repository document is copied, the user is redirected to the standard workflow start page. At this point, starting a workflow causes it to run against the document that was copied into the hidden library. Copying the document into the hidden library enables all workflow functionality for the document, however, the workflow itself must copy the document back into the repository and check it in before completing. Although the entire document must be copied into the hidden library to support workflow, the copying is limited to only those documents participating in workflow. Therefore, the number of copied documents should remain acceptable.

Associating a New Task with a Repository Document

Users may choose to create a task and associate it with a repository document. When users create a task, the Silverlight application prompts them for the name (in the current site) or URL (in another site) of the task list where the task should be created. The ASMX Web services examine the target task list to determine whether it contains a URL field named External Repository. If the list does not contain this field, it is added. After the field is added, the ASMX Web services create a task in the target list and fill the CMIS field with a hyperlink to the target document in the repository. The end user is then redirected to the Edit page for the task to fill in the remaining task information.

Copying Repository Documents

Users may choose to copy a document from the repository to any other SharePoint library by selecting Send To from the context menu. When a user selects Send To, the Silverlight application prompts the user for the name (in the current site) or URL (in another site) of the target SharePoint library. The ASMX Web services then copy the document from the repository into the target SharePoint library.

Hosting the Silverlight Application in a Custom Web Part

The Silverlight application is hosted in a custom Web Part that is added to the default view of the custom document library feature. The Web Part is an excellent host for the Silverlight application because the Web Part can expose properties that can be set in Office SharePoint Server 2007 and passed on to Silverlight. During the external library provisioning process, Web Part properties are set for the Repository ID and the Root Object ID, as shown in Figure 3. These values are then sent to the Silverlight application so that it can connect to the appropriate repository and display documents beginning at the appropriate node in the repository.

Communication between the Silverlight application and the Web Part is accomplished by using InitParameters property. This property can be set by the Web Part and then read by Silverlight when the application loads. The InitParameters property is a comma-delimited string that may be set as shown in the following code.

silverLight.InitParameters = "RepositoryId=" + repositoryId 
+ ",RootObjectId=" + rootobjectId;

Using the Silverlight Application as UI

The Silverlight application acts as the primary user interface to the external repository. The sample uses Microsoft Silverlight 2.0 and uses a grid to display documents in the repository. A context menu for individual documents was created by using a Popup control that appears whenever the user clicks on a document in the grid. Figure 7 shows the context menu in action. A Popup control is also used to create a modal dialog box for displaying information, such as version history and document properties. These dialog boxes are shown in Figure 5, Figure 6, and Figure 7.

The sample uses a fairly basic UI, but a production system could use a much more sophisticated interface. Silverlight supports a wide variety of controls and animations that you could use. A production system could also define additional views for the UI, such as a TreeView control.

The Silverlight application communicates with the ASMX Web services for all repository operations. When the application launches, it establishes connections with the ASMX Web services. As the user interacts with the Silverlight application, calls are made to the ASMX Web services, which can return document information, version information, property information, and more.

Searching the Repository

Searching the repository is made possible through a custom protocol handler created for Office SharePoint Server 2007. The protocol handler uses the WCF Web services to retrieve object information from the repository during the indexing process. This information is added to the SharePoint index, which may then be searched by using the standard Search Center site.

The protocol handler in the sample successfully allows the SharePoint indexing engine to crawl the repository content, but some challenges arose with the manner in which content was indexed and searched. Ideally, the protocol handler should provide a security descriptor to the indexing engine for each object in the repository. The security descriptor allows the Search Center site to security trim search results so that end users see only documents for which they have rights. This security descriptor is based on Windows credentials because the identity of the user in Office SharePoint Server 2007 is a Windows identity.

The challenge for the protocol handler is that the repository uses a custom security system, not Windows security. This means that the repository credentials must be mapped in reverse to obtain the associated Windows user name for each repository user. Unfortunately, there is no mechanism in the SharePoint SSO service to reverse-map user credentials; the mapping is only from Windows account to repository account. Because the protocol handler cannot provide a reverse-mapped security descriptor, the search results in the sample cannot be properly trimmed. Therefore, end users will see results for documents that they cannot open.

There are several possible workarounds for creating a security descriptor:

  • Maintain a separate mapping of user names in an XML file. With this solution, the protocol handler could build a security descriptor, but the drawback is that this file must be maintained separately.

  • Use a naming scheme so that there is an obvious relationship between the Windows credentials and the repository credentials. For example, a Windows account of "DOMAIN\Administrator" might map to a repository user name of "administrator". Using such a scheme, the protocol handler could infer the Windows account from the repository user name. The drawback here is that you might not have that kind of control over production credentials.

  • Use custom security trimming. Office SharePoint Server 2007 supports custom security trimming at query time. Using this approach, you can examine the search results and trim out objects that the current user does not have permission to see. Custom security trimming may seem like the most elegant approach, but it is also the most expensive. A custom security trimming solution can have a significant impact on search performance.

  • Implement the Microsoft Identity Lifecycle Manager (ILM). ILM provides facilities for managing user identity across multiple systems. Using ILM, you can provision new users and synchronize identities between systems. ILM is beyond the scope of this sample. For more information, see Microsoft Identity Lifecycle Manager 2007 FP1.


The sample presented in this article demonstrates how to integrate an external document repository with Office SharePoint Server 2007. The article presented a sample architecture that can be used as a starting point for integrating any external repository with Office SharePoint Server 2007. Additionally, the sample shows how to use a custom library feature as the basis for using advanced SharePoint features, such as metadata, workflow, task assignment, and search.

Additional Resources

For more information, see the following resources: