RBS BlobId Columns in Application Tables - Registering and Indexing

I recently got a question on RBS BlobId columns needing to have an index. This blog post expands on the answer. Feel free to send us any questions using either direct email, comments on this blog or the "Discussions" tab on the RBS Codeplex site.

Registering and Unregistering

An application needs to register its RBS BlobId columns by calling the stored proc rbs_sp_register_column, once for each BlobId column. This is a required step for all RBS applications and RBS behavior is undefined if this is not done. Similarly, an RBS BlobId column can be un-registered by calling rbs_sp_unregister_column. This stored proc should be called before dropping/renaming a registered column or dropping/renaming the table containing a registered column. This must be done even if the intent is to create another table/column with the same name soon after dropping them. This is because RBS stores the object_id and column_id, and these will be different for a new table even if the names are the same. The view rbs_columns shows the list of currently registered columns.

Indexing

RBS BlobId columns in application tables should be indexed - for each BlobId column there should be a separate index with that column as the only index key column. If such an index is not present for a column, the stored proc rbs_sp_register_column gives a warning (raises error with level 10): Column < ColumnName > in Table < SchemaName.TableName > is not indexed.

This index is needed for garbage collection to work efficiently. The Reference Scan phase of garbage collection does a diff: (BlobIds RBS knows about) EXCEPT (BlobIds referenced by application tables). The resulting list of Blobs are the ones that were deleted by the application and should be garbage collected. Since the number of blobs can be very large, this diff is done for a few blobs at a time. This translates to multiple range queries. For such a query to be efficient, we need indices on all BlobId columns.

Note that garbage collection will still work even if the index is not present, but it will be terribly slow on large tables. So, it is highly recommended that an index is created on RBS BlobId columns.

 

- Pradeep