Configure and use the Documentum connector (FAST Search Server 2010 for SharePoint)

 

Applies to: FAST Search Server 2010

The Microsoft SharePoint 2010 Indexing Connector for Documentum enables Microsoft FAST Search Server 2010 for SharePoint (through Microsoft SharePoint Server 2010) to index content that is stored in the EMC Documentum system. This article describes how to install and configure the Indexing Connector for Documentum connector for use with FAST Search Server 2010 for SharePoint.

To download the Indexing Connector for Documentum from the Microsoft Download Center, use the following link: Microsoft SharePoint 2010 Indexing Connector for Documentum (https://go.microsoft.com/fwlink/p/?LinkId=191180&clcid=0x409).

The Indexing Connector for Documentum includes the following features:

  • Based on the SharePoint 2010 Search Connector Framework

  • 64-bit connector

  • One connector supports multiple versions of EMC Documentum Content Server

  • Indexes Documentum objects and object metadata

  • Supports Documentum security definitions and policies

  • Supports Windows PowerShell for automated configuration and administration

  • Configurable search results URL to support multiple Documentum client applications

  • Supports file and folder exclusion for crawling

The following lists describe supported and unsupported object types and properties for the Indexing Connector for Documentum.

Supported container objects and properties:

  • dm_cabinet and subtypes

  • dm_Folder and subtypes

  • r_object_type

  • object_name

  • title

  • subject

  • keywords

  • owner_name

  • r_creator_name

  • r_creation_date

  • r_modifier

  • r_modify_date

  • cabinetpath

  • folderpath

Supported document objects and properties:

  • dm_document and subtypes

  • authors

  • keywords

  • r_full_content_size

  • r_creation_date

  • object_name

  • r_modify_date

  • r_modifier

  • subject

  • title

  • r_object_type

  • a_content_type

  • owner_name

  • r_version_label

  • r_lock_date

  • r_lock_owner

  • r_policy_id

  • r_current_state

  • log_entry

  • r_creator_name

  • r_access_date

  • a_storage_type

  • i_retain_until

  • ContainerPath

  • All custom properties

Unsupported object types:

  • Temp cabinets

  • Temp folders

  • Temp files

Install and configure the prerequisites for the Indexing Connector for Documentum

Use the following procedure to install and configure the prerequisites for the Indexing Connector for Documentum. The steps are listed in the order that they must be performed.

The SharePoint 2010 Indexing Connector for Documentum has the following software prerequisites:

  • One of the following SharePoint Server 2010, Search Server 2010, or FAST Search Server 2010 for SharePoint products:

    • Microsoft SharePoint Server 2010

    • Microsoft Search Server 2010

    • Microsoft Search Server 2010 Express

    • Microsoft SharePoint 2010 for Internet Sites Enterprise

    • Microsoft SharePoint 2010 for Internet Sites Standard

    • Microsoft FAST Search Server 2010 for SharePoint

    • Microsoft FAST Search Server 2010 for SharePoint Internet Sites

  • DFS Server v6.5 with SP2 and DFS hotfix 1049. This server needs to be configured and connected to all repositories.

  • You must use DFS Productivity Layer v6.5 with SP2 and DFS hotfix 1049 .NET assemblies.

    The .NET assemblies are included in the DFS hotfix 1049 package. You can obtain the DFS hotfix 1049 package, which includes both a server side patch as well as a client side patch, by opening a service request on the EMC Powerlink Web site: http://powerlink.emc.com. Alternatively, you can contact your EMC customer representative.

    The Indexing Connector for Documentum uses EMC DFS (Documentum Foundation Services) as the connectivity application programming interface (API) to access Documentum repositories. Therefore, you must install and configure DFS Productivity Layer (client of DFS Server) .NET components on the SharePoint Server 2010 crawl server where the Indexing Connector for Documentum will be installed.

To install and configure the prerequisites for the Indexing Connector for Documentum

  1. Verify that the user account that is performing this procedure is an administrator for the FAST Search Content Search Service Application.

  2. Determine which Documentum content access account you will use for crawling. You have to specify this account and the password later in the configuration procedure when you set up crawl rules. The Indexing Connector for Documentum uses a Documentum content access account to retrieve content from the Documentum repository. This account must have the following credentials:

    • At least read permission to documents that you want to crawl.

    • At least browse permission to cabinets, folders, and records (documents with only metadata) that you want to crawl.

  3. On each SharePoint Server 2010 crawl server, deploy DFS Productivity Layer .NET assemblies to the global assembly cache %windir%\assembly. There are four DLLs that are used by the Indexing Connector for Documentum. Verify the DLL names and versions before you deploy them into the global assembly cache. The following files are included in the DFS1049 Hotfix. When extracted to the default path, these files are located in the directory %local%\emc-dfs-sdk-6.5\emc-dfs-sdk-6.5\lib\dotnet:

    • Emc.Documentum.FS.DataModel.Core.dll, version number 6.5.0.231

    • Emc.Documentum.FS.DataModel.Shared.dll, version number 6.5.0.231

    • Emc.Documentum.FS.runtime.dll, version number 6.5.0.231

    • Emc.Documentum.FS.Services.Core.dll, version number 6.5.0.231

    Note

    You can drag and drop the four DLLs into the global assembly cache (%windir%\assembly) to deploy them, but you might have to turn off User Account Control to do this.

  4. In order for the DFS productivity layer .NET assemblies to function correctly, you must update the .NET machine.config file to include WCF settings for the DFS productivity layer. On each SharePoint Server 2010 crawl server, open the machine.config file located in the following directory: %windir%\Microsoft.NET\Framework64\V2.0.50727\CONFIG. The following WCF settings allow maximum 30 megabytes (MB) per Documentum content object (the document file plus its metadata) transferred. The administrator can increase "maxReceivedMessageSize" in "DfsDefaultService" binding for larger content. The default FAST Search Server 2010 for SharePoint search will handle files with a maximum size of 64 MB. To crawl files larger than 64 MB, follow step 4b.

    1. Go to %windir%\Microsoft.NET\Framework64\v2.0.50727\CONFIG, open the machine.config file, and then add the following XML snippet into the <configuration> element:

      <system.serviceModel>
      <bindings>
      <basicHttpBinding>
      <binding name="DfsAgentService" closeTimeout="00:01:00"
       openTimeout="00:01:00" receiveTimeout="00:10:00" sendTimeout="00:01:00"
       allowCookies="false" bypassProxyOnLocal="false" hostNameComparisonMode="StrongWildcard"
       maxBufferSize="10000000" maxBufferPoolSize="10000000" maxReceivedMessageSize="10000000"
       messageEncoding="Text" textEncoding="utf-8" transferMode="Buffered"
       useDefaultWebProxy="true">
      <readerQuotas maxDepth="32" maxStringContentLength="8192" maxArrayLength="16384"
        maxBytesPerRead="4096" maxNameTableCharCount="16384" />
      <security mode="None">
      <transport clientCredentialType="None" proxyCredentialType="None"
      realm="" />
      <message clientCredentialType="UserName" algorithmSuite="Default" />
      </security>
      </binding>
      
      <binding name="DfsContextRegistryService" closeTimeout="00:01:00"
         openTimeout="00:01:00" receiveTimeout="00:10:00" sendTimeout="00:01:00"
         allowCookies="false" bypassProxyOnLocal="false" hostNameComparisonMode="StrongWildcard"
         maxBufferSize="10000000" maxBufferPoolSize="10000000" maxReceivedMessageSize="10000000"
         messageEncoding="Text" textEncoding="utf-8" transferMode="Buffered"
         useDefaultWebProxy="true">
      <readerQuotas maxDepth="32" maxStringContentLength="8192" maxArrayLength="16384"
      maxBytesPerRead="4096" maxNameTableCharCount="16384" />
      <security mode="None">
      <transport clientCredentialType="None" proxyCredentialType="None"
      realm="" />
      <message clientCredentialType="UserName" algorithmSuite="Default" />
      </security>
      </binding>
      <binding name="DfsDefaultService" closeTimeout="00:01:00" openTimeout="00:10:00" receiveTimeout="00:20:00" sendTimeout="00:10:00" allowCookies="false" bypassProxyOnLocal="false" hostNameComparisonMode="StrongWildcard" maxBufferSize="10000000" maxBufferPoolSize="10000000" maxReceivedMessageSize="30000000" messageEncoding="Text" textEncoding="utf-8" transferMode="StreamedResponse" useDefaultWebProxy="true">
      <readerQuotas maxDepth="32" maxStringContentLength="8192" maxArrayLength="16384" maxBytesPerRead="1048576" maxNameTableCharCount="16384"/>
      <security mode="None">
      <transport clientCredentialType="None" proxyCredentialType="None" realm=""/>
      <message clientCredentialType="UserName" algorithmSuite="Default"/>
      </security>
      </binding>
      </basicHttpBinding>
      </bindings>
      </system.serviceModel>
      
    2. To crawl files larger than 64 MB, you have to get the FAST Search Content Search Service Application and store it in a variable, retrieve the current value of "MaxDownloadSize" and modify it to the size that you want to crawl.  

      1. Run this procedure on the crawl server.

      2. On the Start menu, click All Programs.

      3. Click Microsoft SharePoint 2010 Products.

      4. Right-click SharePoint 2010 Management shell and select Run as administrator.

      5. At the Windows PowerShell command prompt, type the following command(s):

        $ssa = Get-SPEnterpriseSearchServiceApplication -Identity <FAST Search Content Search Service Application>
        $ssa.GetProperty("MaxDownloadSize") 
        $ssa.SetProperty("MaxDownloadSize", <File size larger than 64>)
        $ssa.Update()
        

        Where:

        • <File size larger than 64> is the maximum file size (in MB) that you want to allow the Indexing Connector for Documentum to crawl.

        • <FAST Search Content Search Service Application> is the name of your FAST Search Content Search Service Application.

  5. The Indexing Connector for Documentum will crawl the Documentum document Access Control List (ACL) and map this list to the system ACLs. This allows users to search documents that they have permission to read in Documentum. The Indexing Connector for Documentum supports three kinds of ACL translations that you can configure in DCTMConfig.xml by using the Windows PowerShell cmdlet Set-SPEnterpriseSearchDCTMConnectorConfig.

    For more information about how to set these options and for examples, refer to Install and configure the Indexing Connector for Documentum.

    These are the options for setting up the system ACLs:

    • No Security

      The Indexing Connector for Documentum will ignore Documentum ACLs during crawl and every SharePoint user can search all crawled documents.

    • Assume the same account

      When Documentum and SharePoint 2010 Products are both using Active Directory Domain Services (AD DS) or Active Directory directory service, the Indexing Connector for Documentum assumes a user or group is using the same account in both systems.

    • Translate ACL according to a user mapping table

      If Documentum and SharePoint 2010 Products are not both using AD DS or Active Directory and you want to enable the security search, you have to set up a user mapping table to specify how to do the ACL translation.

  6. The user mapping table requires the following:

    • The user mapping table must be in a Microsoft SQL Server 2008 or later database.

    • The OSearch14 service account must have at least read permission on the user mapping table data.

    DCTMCredentialDomain

    Domain name of a Documentum account. Populate this column when the account comes from the local computer or an LDAP system. The User Source property of the Documentum account should equal None or LDAP, otherwise leave the column empty.

    DCTMCredentialRepository

    Repository name of a Documentum account. Populate this column when the account comes from a Documentum repository.

    DCTMCredentialLoginName

    Login name of the Documentum account

    NTCredential

    Windows domain user account that searches Documentum contents in SharePoint Server

    Example: A Documentum repository user Dan Park has a login that is linked to the Finance repository. Dan's Windows domain user account is Litwareinc\dpark. In this case, the user mapping table entry for Dan appears as the following:

    DCTMCredentialDomain

    ""

    DCTMCredentialRepository

    Finance

    DCTMCredentialLoginName

    dpark

    NTCredential

    Litwareinc\dpark

    Example: In Documentum there is a group called Marketing Department that is located in the M&S repository. The group is mapped to the AD group Marketing under the domain litware. In this case, the group mapping table entry appears as the following:

    DCTMCredentialDomain

    ""

    DCTMCredentialRepository

    M&S

    DCTMCredentialLoginName

    Marketing Department

    NTCredential

    Litware\Marketing

    Note

    If any cells have no value assigned, they cannot be NULL or empty. You must assign the following empty string value: ''.
    For each Documentum group there must be an NT group in the user mapping table and they must both contain the same user information.
    The user mapping table contains mappings for both users and groups. The table must be kept up to date. If you add a group in Documentum, you need to add a corresponding group in the Active Directory and set up a mapping to it.

    Use the following script to create a user mapping table:

    CREATE TABLE <replace with your user mapping table name>
    (
    DCTMCredentialDomain nvarchar (255) NOT NULL , 
    DCTMCredentialRepository nvarchar (32) NOT NULL , 
    DCTMCredentialLoginName nvarchar (80) NOT NULL , 
    NTCredential nvarchar (255) NOT NULL , 
    CONSTRAINT PK_CredentialMapping PRIMARY KEY CLUSTERED 
    ( DCTMCredentialDomain, DCTMCredentialRepository, DCTMCredentialLoginName )
    ) 
    

    Populate the new mapping table with Documentum/NT Credential pairs as seen in the above table. Grant the OSearch14 account read access to this table.

Install and configure the Indexing Connector for Documentum

Use the following procedure to install and configure the Indexing Connector for Documentum.

To install and configure the Indexing Connector for Documentum

  1. See Add-SPShellAdmin. This article contains information that helps you to verify the permissions that are required to perform this procedure.

  2. Open the Windows PowerShell command console.

  3. On each SharePoint Server 2010 server in the farm that is running a crawl component, run the Indexing Connector for Documentum DCTMIndexConn.exe. Follow the steps presented in the installation wizard.

  4. On the SharePoint Server 2010 crawl server, use the following Windows PowerShell cmdlet to register the indexing connector to the FAST Search Content search service application: New-SPEnterpriseSearchCrawlCustomConnector

  5. Use the following example for a single FAST Search Content search service application: New-SPEnterpriseSearchCrawlCustomConnector -SearchApplication "<name of your FAST Search Content search service application>" -Protocol "dctm" -ModelFilePath "<C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\CONFIG\SearchConnectors\Documentum\MODEL.xml>" -Name "Microsoft SharePoint 2010 Indexing Connector for Documentum"

  6. Use the following example for all search service applications on the farm: Get-SPEnterpriseSearchServiceApplication | New-SPEnterpriseSearchCrawlCustomConnector -Protocol "dctm" -ModelFilePath "C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\CONFIG\SearchConnectors\Documentum\MODEL.xml" -Name "Microsoft SharePoint 2010 Indexing Connector for Documentum"

  7. On each SharePoint Server 2010 crawl server, set the configuration details using the following Windows PowerShell cmdlet: Set-SPEnterpriseSearchDCTMConnectorConfig. All the settings are stored in <system drive>\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\CONFIG\SearchConnectors\Documentum\DCTMConfig.xml. If more than one SharePoint Server 2010 crawl server is used, all settings must be the same on each server.

    Use the following Windows PowerShell commands to display help and examples for the Indexing Connector for Documentum:

    • Get-help Set-SPEnterpriseSearchDCTMConnectorConfig -full shows full help.

    • Get-help Set-SPEnterpriseSearchDCTMConnectorConfig -examples shows only examples.

    The following tables describe the important parameters of the Set-SPEnterpriseSearchDCTMConnectorConfig cmdlet.

    ACLTranslation Directs the behavior of the ACL translation

    UserMappingTable

    Default value. The Indexing Connector for Documentum translates the Documentum ACL into a Windows ACL according to the user mapping table. UserMappingTableSQLServer, UserMappingTableSQLInstance, UserMappingTableDBName, UserMappingTableName and UnMappedAccount take effect only when ACLTranslation is set to "UserMappingTable".

    NoSecurity

    The Indexing Connector for Documentum ignores the Documentum ACL during crawl. For example, all documents from Documentum will be searchable by any SharePoint user. This option enables you to decline enforcement of security trimming or implement custom security trimming.

    SameAccountName

    The Indexing Connector for Documentum assumes Documentum and SharePoint users share the same account, such as a shared account in Active Directory. Once an invalid NT account is found, the Indexing Connector for Documentum discards the account permission.

    UnmappedAccount Defines how to process Documentum accounts which have no corresponding Windows account defined in the user mapping table.

    DiscardACE

    Default value of "UnmappedAccount". The Indexing Connector for Documentum discards a Documentum account when no mapped Windows account is found. If there is any other mapped account for the document, the document will be crawled. If none of the accounts for this document can be mapped, the document will be discarded and an error message will be entered in the crawl log.

    AssumeSameAccount

    Assumes the NT account is the same as the Documentum account.

    Other parameters Description

    UserMappingTableSQLServer

    Host name of the computer that is running SQL Server which contains the user mapping table.

    UserMappingTableSQLInstance

    Name of the SQL Server instance that contains the user mapping table.

    UserMappingTableDBName

    Name of the SQL Server database that contains the user mapping table.

    UserMappingTableName

    Name of the user mapping table.

    DisplayURLPatternForDocument

    DisplayURL pattern for documents. Any valid URL with part of the string replaced by placeholders such as {ObjectId}, {RepositoryName} or {Format}. You can find examples below.

    DisplayURLPatternForContainer

    DisplayURL pattern for folders and cabinets. Any valid URL with part of the string replaced by placeholders such as {ObjectId}, {RepositoryName} or {Format}. You can find examples below.

    DFSURL

    Specify the DFS Web Services URL for each repository that is to be crawled. More than one DFS Web Services URL can be specified for each repository. Use the following format: "RepositoryName1\DFSURL1.1\DFSURL1.2\...\DFSURL1.n\\RepositoryName2\DFSURL2.1\DFSURL2.2\...\DFSURL2.n\..."

    PersistDCTMACL

    Specify whether to store the Documentum ACL in a crawled property. If "PersistDCTMACL" is set to "True", the Indexing Connector for Documentum will store the Documentum ACL information as a crawled property. The default value is "False".

    Example 1: Set to "UserMappingTable" mode.
    Set-SPEnterpriseSearchDCTMConnectorConfig -ACLTranslation "UserMappingTable" -UnmappedAccount "DiscardACE" -UserMappingTableSQLServer "<YourDatabaseServerName>" -UserMappingTableSQLInstance "<YourDatabaseInstanceName>" -UserMappingTableDBName "<YourMappingDatabaseName>" -UserMappingTableName "<YourMappingTableName>" -DFSURL "RepositoryName1\http://MACHINENAME1:PORT1/services\\RepositoryName2\http://MACHINENAME2:PORT2/services\http://MACHINENAME3:PORT3/services" -DisplayURLPatternForDocument "http://MACHINENAME4:PORT4/webtop/component/drl?objectId={ObjectId}&format={Format}&RepositoryName={RepositoryName}" -DisplayURLPatternForContainer "http://MACHINENAME5:PORT5/webtop/component/drl?objectId={ObjectId}&RepositoryName={RepositoryName}"
    
    Example 2: Set to "NoSecurity" mode.
    Set-SPEnterpriseSearchDCTMConnectorConfig -ACLTranslation "NoSecurity" -DFSURL "RepositoryName1\http://MACHINENAME1:PORT1/services\\RepositoryName2\http://MACHINENAME2:PORT2/services\http://MACHINENAME3:PORT3/services" -DisplayURLPatternForDocument "http://MACHINENAME4:PORT4/webtop/component/drl?objectId={ObjectId}&format={Format}&RepositoryName={RepositoryName}" -DisplayURLPatternForContainer "http://MACHINENAME5:PORT5/webtop/component/drl?objectId={ObjectId}&RepositoryName={RepositoryName}"
    
    Example 3: Set to "SameAccountName" mode.
    Set-SPEnterpriseSearchDCTMConnectorConfig -ACLTranslation "SameAccountName" -DFSURL "RepositoryName1\http://MACHINENAME1:PORT1/services\\RepositoryName2\http://MACHINENAME2:PORT2/services\http://MACHINENAME3:PORT3/services" -DisplayURLPatternForDocument "http://MACHINENAME4:PORT4/webtop/component/drl?objectId={ObjectId}&format={Format}&RepositoryName={RepositoryName}" -DisplayURLPatternForContainer "http://MACHINENAME5:PORT5/webtop/component/drl?objectId={ObjectId}&RepositoryName={RepositoryName}"
    
  8. After setting the configuration details, restart the OSearch14 service on each SharePoint Server 2010 crawl server.

Create a crawl rule for the Indexing Connector for Documentum

Before a crawl, create crawl rules to include or exclude specific content in Documentum. Use the following procedure to create a crawl rule for the Indexing Connector for Documentum.

To create a crawl rule for the Indexing Connector for Documentum

  1. Verify that the user account that is performing this procedure is an administrator for the FAST Search Content search service application.

  2. Open SharePoint Central Administration, and then click Manage Service Applications.

  3. Click the name of your FAST Search Content search service application for which you want to create a crawl rule.

  4. Under Crawling, click Crawl Rules.

  5. On the Manage Crawl Rules page, click New Crawl Rule.

  6. On the Add Crawl Rule page, specify the following information to create at least one crawl rule.

    • In Path box, type the path of the content that you want to crawl.

      You can use wildcard "*" or regular expression syntax. Because Documentum uses case sensitive names for the content, select the Match case check box. Refer to the section Syntax to refer to a Documentum object for examples.

    • In Crawl Configuration section, select Include all items in this path, and then select Crawl complex URLs (URLs that contain a question mark - ?).

    • In the Specify Authentication section, select Specify a different content access account, and then type the Documentum content access account and password that you specified earlier in this article in the appropriate boxes.

    • Make sure that the Do not allow Basic Authentication check box is cleared.

  7. Click OK to finish configuration.

    Note

    • You can create multiple crawl rules for Documentum to include or exclude Documentum content.

    • You can use different crawl rules to specify different content access accounts for different Documentum content. For example, you have two repositories and two content access accounts for each repository. The Documentum content access account specified in a crawl rule will only be applied to Documentum content covered by the path in that crawl rule.

Create a content source for the Indexing Connector for Documentum

Use the following procedure to create a content source.

To create a content source for the Indexing Connector for Documentum

  1. Verify that the user account that is performing this procedure is an administrator for the FAST Search Content search service application.

  2. Open SharePoint Central Administration, and then click Manage Service Applications.

  3. Click the name of your FAST Search Content search service application in which you want to create a content source.

  4. On the Search Administration page, under Crawling, click Content Sources.

  5. On the Manage Content Sources page, click New Content Source.

  6. On the Add Content Source page, do the following:

    1. In the Name box, type the name of the content source.

    2. In the Content Source Type section, select Custom Repository.

    3. In the Type of Repository section, select the SharePoint 2010 Indexing Connector for Documentum. Mark the radio button in front of the name that you specified for the Indexing Connector for Documentum when you registered it with the FAST Search Content search service application, for example dctm.

    4. In the Start Addresses section, type the start addresses. The start address format is the same as the format used to specify the path. You can type more than one start address for the content source, one per line.

      Refer to the section Syntax to refer to a Documentum object for examples.

    5. In the Crawl Schedules section, select schedules from the Full Crawl and Incremental Crawl drop-down lists, or create schedules for each kind of crawl.

    6. In the Content Source Priority section, assign a priority level to the content source according to your business requirements.

    7. Select Start full crawl of this content source to start a crawl immediately after the content source is created.

    8. Click OK to finish the configuration and accept all configured options.

    The Documentum content source is configured and the system can crawl Documentum content repositories that are specified in the content source.

SharePoint Server 2010 supports a scalable architecture for performance scale-out. You can deploy more than one crawl server and configure multiple SharePoint Server 2010 crawlers to crawl the EMC Documentum database simultaneously.

Syntax to refer to a Documentum object

The format to refer to a Documentum object that you use for the path (when you set up a crawl rule) and the start address (when you set up a content source) is defined in the following table:

Type of Documentum object Syntax for the path or the start address

Repository

dctm://<clientapphostname>/<repository name>

Cabinet

dctm://<clientapphostname>/<repository name>/<cabinet name>

Folder

dctm://<clientapphostname>/<repository name>/<cabinet name>/<folder name>

Document

dctm://<clientapphostname>/<repository name>/<cabinet name>/<folder name>/…/<folder name>?DocSysID=<r_object_id> (where r_object_id is the object id of that document)

<clientapphostname> is the host name of your Documentum client application such as Webtop or DA. The <clientapphostname> configured here should be same as the same as the one used in content source. <repository name>, <cabinet name>, and <folder name> are case sensitive.