How to: Crawl Binary Large Objects (BLOBs)
This topic describes how you can crawl binary large object (BLOB) data stored in a Microsoft SQL Server database.
Applies to: SharePoint Server 2010
In Microsoft SharePoint Server 2010, the Business Data Connectivity (BDC) service supports reading BLOB data types, which is useful for streaming BLOB data from external systems. By using the Microsoft SharePoint Server 2010 search connector framework, you can crawl the BLOB data that is streamed from the external system. However, to enable this functionality, you must modify the BDC model file manually.
Crawling BLOB Data from a SQL Server Database Table
The Microsoft SQL Server database table must have a column that specifies either the extension or the MIME type of the BLOB data. If the database table schema does not include a column with this information, you must add it to the schema. The following tables contain an example of a database table schema with this column and sample values for it that are stored in the database table.
Table 1. Sample database table schema
Column Name |
Data Type |
---|---|
Id |
Int |
DisplayName |
nvarchar(50) |
Extension |
nvarchar(50) |
Data |
varbinary(MAX) |
ContentType |
nvarchar(MAX) |
Table 2. Sample database table values
Id |
Display Name |
Extension |
Data |
Content Type |
---|---|---|---|---|
1 |
File1 |
.docx |
0x504B… |
application/vnd.openxmlformats-officedocument.wordprocessingml.document |
2 |
File2 |
.doc |
0xD… |
application/msword |
3 |
File3 |
.txt |
OxE… |
text/plain |
Modifying the BDC Model File to Enable Crawling
After you confirm that the database table contains the extension or MIME type information for the BLOB data, you can use Microsoft SharePoint Designer 2010 to create an external content type that is based on the table containing the BLOB data. Then, you can create all the operations. For more information, see How to: Create External Content Types and How to: Create an External Content Type Based on a SQL Server Table.
After you create the BLOB external content type, you are ready to modify the BDC model file to enable crawling. You cannot make these modifications in SharePoint Designer 2010. So you must export the BDC model file, and use an XML editor to make these changes manually.
To export the BDC model file for the BLOB external content type
In SharePoint Designer 2010, click External Content Types in the left navigation to display the external content types that are defined in that site's service application's BDC Metadata Store.
In the External Content Types list, select the BLOB external content type. Then, click Export BDC Model on the Server ribbon.
Type a name in the BDC Model Name text box, and then click OK.
Select the location where you want to save the BDC model (.bdcm) file, and then click Save.
To enable crawling of the BLOB external content type
In an XML editor, open the BDC model file you created in the previous section.
Create a new method that returns the BLOB field. You should define a StreamAccessor type method instance for this method, as shown in the following example.
Note
The table name in this example is Attachment.
<Method Name="GetData"> <Properties> <Property Name="RdbCommandText" Type="System.String">SELECT Data FROM [dbo].[Attachment] WHERE [Id] = @Id </Property> <Property Name="RdbCommandType" Type="System.Data.CommandType, System.Data, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089">Text</Property> </Properties> <Parameters> <Parameter Direction="In" Name="@Id"> <TypeDescriptor TypeName="System.Int32" IdentifierName="Id" Name="Id" /> </Parameter> <Parameter Name="StreamData" Direction="Return"> <TypeDescriptor TypeName="System.Data.IDataReader, System.Data, Version=2.0.3600.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" IsCollection="true" Name="DataReaderTypeDescriptorName"> <TypeDescriptors> <TypeDescriptor TypeName="System.Data.IDataRecord, System.Data, Version=2.0.3600.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" Name="DataRecordTypeDescriptorName"> <TypeDescriptors> <TypeDescriptor TypeName="System.Data.SqlTypes.SqlBytes, System.Data, Version=2.0.3600.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" Name="Data" /> </TypeDescriptors> </TypeDescriptor> </TypeDescriptors> </TypeDescriptor> </Parameter> </Parameters> <MethodInstances> <MethodInstance Name="DataAccessor" Type="StreamAccessor" ReturnParameterName="StreamData" ReturnTypeDescriptorName="Data"> <Properties> <!-- If extension field is available--> <Property Name="Extension" Type="System.String">Extension</Property> <!--If MimeType is available--> <Property Name="ContentType" Type="System.String">ContentType</Property> <!--If attachments is to be displayed in profile pages, add the following property--> <Property Name="MimeTypeField" Type="System.String">ContentType</Property> </Properties> </MethodInstance> </MethodInstances> </Method>
If the MIME type is the same for all the BLOBs, you can replace this line of code from the previous example:
<Property Name="ContentType" Type="System.String">ContentType</Property>
with the following line of code:
<Property Name=" ContentType " Type="System.String">application/vnd.openxmlformats-officedocument.wordprocessingml.document</Property>Re-import the model file by using the Business Connectivity Services service application administration UI. For more information, see Manage BDC Models (SharePoint Server 2010) on MSDN TechNet.
Create the content source for the external content type. For more information, see Add, Edit, or Delete a Content Source (SharePoint Server 2010) on TechNet.
Launch a full crawl of the content source. For more information, see Start, Pause, Resume, or Stop a Crawl (SharePoint Server 2010) on TechNet.