Managing Metadata in SQL Server 2005
Writers: Mark Chaffin, Brian Knight
Applies To: SQL Server 2005
Summary: This white paper covers several interesting and unique methods for managing metadata using built-in features in SQL Server 2005 Integration Services, Analysis Services, and Reporting Services. The paper explains such concepts as data lineage, business and technical metadata, auditing, and impact analysis.
On This Page
Introduction
Business Metadata in SQL Server 2005
Technical Metadata in SQL Server 2005
Metadata Intelligence
About The Authors
Conclusion
Introduction
What is metadata? The short answer is that metadata is data about data. The long answer continues to fill volumes of books, and challenges both technical staff and business users. Our primary goal in this white paper is to discuss some different approaches for creating, maintaining, managing, and accessing metadata for the purposes of augmenting reporting and analysis systems. We also explain how to use metadata in features in the SQL Server 2005 platform to improve the overall end user experience.
What is metadata used for? Depending on your viewpoint, metadata is used to add context and understanding about data that users are accessing, or it can be used to hide complexity from end users who are not required to know or understand the technical details of data. Metadata can also be used by systems to determine data types, to reveal the structure and relationships between data, or to determine if and when data last changed, and what changes were made to data. Metadata is also used by applications and systems to perform type checking, data validation, and formatting of data.
In most business intelligence projects, metadata is classified by usage as either business metadata or technical metadata. Business metadata means data available to the end user that helps explain the source or meaning of the data. Technical metadata refers to metadata that is used to support back-office functions, and therefore is of interest primarily to developers or administrators. Some examples of metadata include data models, schemas, tables, columns, indexes, cubes, reports. Metadata is created by data modeling tools, databases, report writing tools, ETL tools, and of course, end users. In later sections we'll discuss some of these tools and where metadata can be created, configured and maintained.
To help the developer generate and work with metadata, Analysis Services (SSAS) has a new feature named the Universal Dimensional Model, or UDM. By using the UDM, the developer can create a single, unified logical model for both OLAP and relational reporting and then share that model with other applications and systems through XML for Analysis or OLE DB for OLAP. SSAS also has a new object model called AMO or Analysis Management Objects that enables developers to programmatically manage the multidimensional metadata.
In SQL Server Integration Services (SSIS), the developer can configure and design packages that extract and organize metadata from the source systems, attach the metadata to the data that is being transformed and cleansed, add more metadata from the ETL process, and finally load the cleansed data into the data warehouse and the associated metadata into the metadata database.
Perhaps the most innovative feature in SQL Server 2005 that pertains to metadata is the Report Builder. By using Report Builder, a developer or administrator can publish business metadata in a form that helps end users better understand the data without having to understand the underlying data structures.
In this white paper, we'll discuss how to combine these new features in SQL Server 2005 to address the core need of managing and presenting business and technical metadata. This white paper discusses the following topics:
How SQL Server 2005 handles business and technical metadata.
How to integrate metadata and lineage into your SSIS package to comply with laws like the Sarbanes-Oxley Act.
How to tap into metadata by using the SQL Server 2005 Business Intelligence (BI) Metadata Samples Toolkit.
How to extend the SQL Server 2005 BI Metadata Samples Toolkit.
Business Metadata in SQL Server 2005
Business metadata includes metadata that an end user can access, and typically is used to explain how data is sourced, configured, or calculated. In SQL Server 2005, business metadata is pervasive through the platform products, starting with the database engine and continuing through the other servers and applications. Each server type — Analysis Services, Integration Services, and Reporting Services — exposes metadata programmatically as well as through functionality included in the respective design and administrative tools.
Business metadata encompasses the names, descriptions, and captions for tables and columns. Business metadata can also refer to input masks or formatting rules that applications use to validate data, or to information about how the business definition of the data element is stored, or who is responsible for managing the object. All this metadata is stored in the database engine.
Business metadata is especially useful in Analysis Services, for giving meaning to dimensions and elements, and in Report Builder, for making reports easier to understand by the end user. Analysis Services requires a front-end reporting or analysis tool to expose metadata, but the inclusion of metadata in the analysis engine supports a consistent, manageable approach to business metadata. This section covers the use of business metadata by Analysis Services and Report Builder in more detail. Use of metadata by Integration Services is covered in the section on Technical Metadata.
Metadata and the Database Engine
In the database engine, SQL Server 2005 adds support for extended properties to almost all database objects. Extended properties can be used for rich metadata management because each table and column — and now most other objects in the relational engine — can have multiple extended properties, each containing up to 7500 bytes of data. Because these properties are scriptable and are part of the structure of each object, developers can use these properties to store business-level definitions of each object.
For example, the figure shows the extended properties for the Person.Address table in the AdventureWorks sample database. This table has the extended property MS_Description.
To view the extended properties for a table, open SQL Server Management Studio. In Object Explorer, locate the database you want to work with, and expand the Tables node. Right-click the table or object that you want to view, and then click Properties.
Figure 1 Extended properties for tables
This extended property, MS_Description , is created when you modify a table through its property page and add a description to a table or column definition. However, you can add many more extended properties, and manage extended properties by using the Transact-SQL statement calls listed in the following table:
Table 1 System stored procedures and functions for working with extended properties
Action |
Statement |
---|---|
Add an extended property |
sp_addextendedproperty |
Delete an extended property |
sp_dropextendedproperty |
Update an extended property |
sp_updateextendedproperty |
View an extended property |
fn_listextendedproperty sys.extended_properties |
The following Transact-SQL example updates the extended property MS_Description with a new value:
USE AdventureWorks; GO EXEC sys.sp_updateextendedproperty @name = N'MS_Description', @value = N'New Description for the Address table', @level0type = N'SCHEMA', @level0name = Person, @level1type = N'TABLE', @level1name = Address; GO
To display all the table-level properties in the Person schema, including the one we just updated, execute the following Transact-SQL example:
USE AdventureWorks; GO SELECT objtype, objname, name, value FROM fn_listextendedproperty (NULL, 'schema', 'Person', 'table', default, NULL, NULL); GO
The results of this query are shown in the following table:
Table 2 Extended properties for tables in the Person schema
objtype |
objname |
name |
value |
---|---|---|---|
TABLE |
Address |
MS_Description |
Street address information for customers, employees, and vendors. |
TABLE |
AddressType |
MS_Description |
Types of addresses stored in the Address table. |
TABLE |
Contact |
MS_Description |
Names of each employee, customer contact, and vendor contact. |
TABLE |
ContactType |
MS_Description |
Lookup table that contains the types of contacts stored in the Contact table. |
TABLE |
CountryRegion |
MS_Description |
Lookup table that contains the ISO standard codes for countries and regions. |
TABLE |
StateProvince |
MS_Description |
State and province lookup table. |
Metadata and Analysis Services
In Analysis Services, business metadata is embedded throughout the application in data source views (DSVs), dimensions, attributes, hierarchies, cubes, translations, actions, and many other objects. All these objects contain, manage and display metadata in different ways. The developer can enhance the way that end users interact with business metadata either by adding the metadata directly to the structure of the objects, or by allowing the user to take action based on the metadata associated with an object. In this section, we'll briefly discuss some of these features.
Dimension Attributes
In Analysis Services 2005, dimension attributes encompass objects that in the earlier version of Analysis Services were formerly member properties, levels or hierarchies. Dimension attributes now define the dimension hierarchy and levels, and all supporting properties become alternative hierarchies with a default "All" level. In other words, a dimension is a collection of attributes that either defines one of many hierarchies or provides a grouping of dimension members.
The following figure illustrates the concept of dimension attributes by using the Product dimension from the AdventureWorksDW Analysis Services database. Each level listed in the Hierarchies and Levels pane corresponds to an attribute in the lower pane.
Figure 2 Dimension structure tab showing attributes and hierarchies
Attributes that do not define levels in a hierarchy can become hierarchies of their own, similar to building a virtual dimension in Analysis Services 2000.
After you have created a dimension, you can add attributes to hierarchies and edit attribute properties in the Dimension Designer. Each attribute can be bound to one or more columns in a dimension table, such as columns for the KeyColumns and NameColumns properties.
Translations
Analysis Services 2005 includes support for attributes in multiple locales through the use of translations. Whereas in Analysis Services 2000 locales were supported only for data, Analysis Services 2005 supports multiple languages for the metadata and structural elements. Every element — from the dimension name itself to the levels, member attributes, hierarchies and member names — can have a different translation for multiple locales. The figure shows the translation table for the Time dimension. Notice that the Time dimension, the [All Periods] member, and the Month Name member all have translations in both Spanish and French.
Figure 3 Translation tab for the Time dimension
Actions
Actions are MDX statements associated with dimensions, levels, members, or cells. An action allows the client application to perform a task, such as starting an application and using the selected item as a parameter. By using actions, the client application can provide a seamless link to another application for metadata or data lookup. For example, a business user browsing a cube notices that a particular customer has recently made a large purchase. The client application presents a list of actions that the business user can select from, one of which is to send e-mail to the customer, based on the stored e-mail address of the customer.
Actions provide an easy way for end users to act on data in ways that are context- sensitive and metadata-driven, in effect, closing the loop between analysis and activity. Actions are managed in cubes and can be associated with attribute members, cells, cubes, dimension members, hierarchies, hierarchy members, levels, or level members.
Dimension Intelligence
Analysis Services 2005 provides the ability to define metadata for dimension types. The metadata on these types can be used to control the behavior of the dimension or to add functionality in reporting and analysis applications. Several of the dimension types have specific purposes and are used by other parts of the platform. For example, if you have a Currency dimension and set its type appropriately, Analysis Services uses this information to set the appropriate conversion rules. Other types that have additional behavior enhancements include Account and Time dimensions.
Metadata and Report Builder
Report Builder is an important new component in SQL Server 2005 Reporting Services (SSRS) that allows business users to explore and browse information without having to understand the technical schema that may be supporting the information. Report Builder is an abstraction layer over the physical schema of a database. Report Builder allows business users to explore data in an ad hoc manner using familiar Microsoft Office paradigms. Business users use Report Builder to create Reporting Services reports using Report Definition Language (RDL), which can be published like other SSRS reports to a centralized reporting server.
Business users interact with data from SQL Server or Analysis Services using a predefined Report Model that is created in Business Intelligence Development Studio. Models can also be created by using the Report Model Wizard. Use of the wizard speeds development time, but the wizard makes assumptions and inferences based on automatic data exploration; therefore, if you use the wizard, you should expect to continue to refine the model after the wizard has finished running. The figure shows a model that was created by using the wizard, based on the AdventureWorksDW relational database. This shows a business view of the warehouse that will later be exposed in reports. Use of the business view means that the user won't have to know the physical names for columns.
Figure 4 Report Model tab in Report Builder
Technical Metadata in SQL Server 2005
This section discusses enhancements in SQL Server 2005 that support back-office functions that may include auditing, logging, dependency analysis, and change management. This section focuses particularly on the new capabilities of Integration Services, the Unified Dimensional Model (UDM), and data source views.
Integration Services
Integration Services (SSIS) provides the developer with a rich and sophisticated set of tools to manage, manipulate, and move data from sources to targets. Metadata about the data schema and source is contained in the SSIS packages and in the audit trail of package execution; metadata about the data that travels through the package transformations is tracked through lineage. This section discusses each of these new features.
Data Flow Enhancements
Integration Services includes a new data transformation called a Derived Column transformation. Using the Derived Column transformation, a developer can add a column or columns to the data transformation process based on variables or expressions. The values of these variables or expressions can be set during the execution process, and can include technical metadata such as the file name of the source of the data row, the date/time of the processing, or the name or ID of the Integration Services package that transformed the data. The example in the figure shows how the Derived Column transformation can be used to read technical metadata about the package and use that metadata in an audit. For example, you can audit the version of the package that was executed, and when the package was run. In this example, the metadata being audited includes the creator, build version, major version, minor version, and source file name.
Figure 5 Derived Column Transformation Editor
The data lineage capability in SSIS lets you add more information to the data as it is being processed. The Derived Column transform also allows inclusion of other types of metadata: for example, you can insert system variables into the data flow for auditing purposes.
To use a system variable in the Derived Column transformation, drag the variable from the variable list at upper left into the Expression text box for the derived column. If you type the variable name, make sure that you prefix the variable with @ and specify the System namespace. The complete syntax for a system variable is @ [System::VariableName] where VariableName represents the system variable you want to use. The following table lists some of the system variables that you can include into your package to audit your package and track the lineage of the data being imported:
Table 3 System variables for package auditing
Variable |
Description |
---|---|
ContainerStartTime |
The time when the container began to execute. |
CreatorComputerName |
The name of the computer that created the package. |
CreatorName |
The user name that created the package. |
ExecutionInstanceGUID |
The execution instance of the package. |
InteractiveMode |
Indicates whether the package is being executed interactively by a user or through the DTExec utility (which jobs will use). |
LocaleID |
The locale used for the data flow task. |
MachineName |
The computer name where the package is running. |
PackageID |
The unique GUID for the package. |
PackageName |
The package's name. |
StartTime |
The time when the package was executed. |
TaskID |
The TaskID for the data flow task |
TaskName |
The name of the task instance. |
TaskTransactionOption |
The transaction option that the task uses. |
UserName |
The user name the started the package. |
VersionBuild |
The package version. |
VersionComments |
String variable that lists any comments that are in the package. |
VersionGUID |
The GUID that represents the unique package version. |
VersionMajor |
The major version for the package. |
VersionMinor |
The minor version for the package. |
You are not limited to system variables, but can create and use user-defined variables in the Derived Column transformation. For example, you can read a header record to determine the batch ID for the extract, load that batch ID into a variable, and then insert the value of that variable into the data flow using the Derived Column transformation.
Auditing and Logging
The Sarbanes-Oxley Act of 2002 is an American law that regulates the accountability of publicly owned companies. The main purpose of the law is to make sure that a financial statement that a company generates is legitimate and that there is no possibility that the financial statement can be manipulated. Depending on how compliance with the Act is interpreted, preventing manipulation of financial statements can greatly affect the work of the IT department. Some interpret the law to mean that the IT department must audit all changes to a database, which would involve storing information about the identity of anyone who loads a row into a database and when the row was added.
If you work for a company where auditing is important, you can capture that type of lineage information easily in SSIS by using the Audit transformation. The Audit transformation adds extra columns to the data flow that contain lineage information about the package, task, or system. The following table lists audit values that you can add to the data flow in SQL Server 2005 Integration Services:
Table 4 Values available in the Audit transformation
Value |
Description |
---|---|
Execution Instance ID |
The unique identifier (GUID) that contains the execution instance of the package. |
Execution Start Time |
The time when the package began to execute. |
Machine Name |
The name of the computer where the package was executed. |
Package ID |
The GUID that contains the unique ID for the package. |
Package Name |
The name of the package. |
Task ID |
The GUID for the Data Flow task that contains the Audit transformation. |
Task Name |
The name of the Data Flow task that contains the Audit transformation. |
User Name |
The name of the user who executed the package. |
Version ID |
The GUID that contains the unique ID for the version of the package. |
To add this lineage information to your SSIS package, drag the Audit transformation from the Toolbox and drop the transformation onto the data flow design surface at the point in the processing where you wish to begin auditing. Double-click the transformation to configure it to add outputs for the Username and Execution Start Time, as shown in the illustration:
Figure 6 Audit Transformation Editor
This transformation is most frequently used to capture lineage data about records that SSIS manipulated, or about records that could not be loaded. For example, when rows contain errors that cannot be corrected automatically, you can add auditing data to the row and then insert the row into an error queue, as shown in the figure. This lineage data will help an operations person debug the source of the error.
Figure 7 Audit package example
To perform auditing on error rows, you must configure the error output on the transformation or data connection (source or destination). The ability to configure handling of errors at the row level is one of the most significant enhancements in SQL Server 2005 Integration Services. In SQL Server 2000 Data Transformation Services (DTS), the whole package could fail if a single record contained an error such as a foreign key violation. After debugging the issue you would then have to rerun the package.
In contrast, in Integration Services, you can specify how you would like SSIS to handle the error. You can choose to ignore the error, cause the package to fail on the error, or redirect the bad row to a different path in the pipeline. The advantage to redirecting the row is that you can try to clean the data in the redirected row, and, if automatic data cleaning is not successful, you can redirect the row one more time to an error queue, to be manually corrected later.
To configure a transformation or data connection to output records that have failed to a different path, click Configure Error Output in most transformations, or click the Error Output page in a source or destination. After the Configure Error Output dialog box opens (see Figure 8), specify how you want to handle the error. There are two classes of errors: truncation errors, and errors that are specific to the transformation or connection. A truncation error occurs when you try to move data into a column that has a smaller data type than your data. Errors that are specific to the transformation or connection vary depending on the type of processing done by the transformation or connection that you use. A source and destination typically have conversion errors, whereas the Lookup transform generates an error if the record cannot be found in the lookup table. The type of error is output in the Description column, seen at the right side of the figure.
Figure 8 Configure Error Output dialog box
SSIS Metadata API
Although an SSIS package can be viewed as an XML file, the easier way to access the metadata is through the sophisticated SSIS API. There are dozens of classes, interfaces, and enumerations in the Microsoft.SqlServer.Dts.Runtime namespace that you can use to work with package metadata. For example, the ConnectionInfo class contains information about a connection on a computer, whereas the ConnectionInfos collection is a collection of all the connections on a computer.
SSIS Logging
By using SQL Server 2005 Integration Services, you can log information about the package's behavior at run time to any of five log providers: text file, SQL Profile trace, SQL Server table, XML file, or the Windows Event log. You can also write to multiple providers if you want to audit the package using different methods or store different types of information in different log formats. If the log providers available with SQL Server 2005 do not meet your requirements, you can create a custom logging provider.
The most commonly used log provider writes audit information to a SQL Server table. This log provider is useful because information in the log can be output to a report created by using Reporting Services, and the report can be viewed on a Web page or subscribed to for delivery through e-mail to see a package's status. If you select this provider, you must create a connection manager that specifies the database and table you want to write to, and then configure logging on the package to specify the events you want to log. The diagram shows an example of the log events that are available. You can log warnings, errors, or information messages, and you can specify different details for each event or message.
Figure 9 Configure SSIS Logs dialog box
When you select the SQL Server database option for logging, a table named sysdtslog90 is created in the database that you use to store the log events. The table typically contains operational data such as when each task started and stopped, and the success of each task. The most important column in the table is the message column that stores the verbose result of that step. The sysdtslog90 table can grow quite large so you should archive the table periodically.
Data Source Views
A data source view (DSV) presents schema information such as tables and views from multiple data sources in a single, unified view of the metadata. A data source view is also used to filter unnecessary objects from a data source, or in Analysis Services, to join the data from a data warehouse to its associated metadata.
The data source view allows the developer to define a business view of the data that makes the backend processes transparent. For example, you may have a table in the DSV that translates a Siebel column name like "E432K3" to a friendly name like "First Name". The user would not have to open his data dictionary to determine what E432K3 is and his productivity would increase greatly. After the initial UDM is created from the DSV, there is less need for a data expert and the work of reporting can be shifted to the business users, who can develop new, powerful reports on their own.
In Integration Services, the advantages of using a data source view (DSV) include the following:
A DSV can be defined one time and then used by multiple data flow components.
A DSV can be refreshed to reflect changes in its underlying data sources.
Metadata from the data source is cached, allowing developers to develop cubes and packages without having live connections to the source data systems.
A DSV can focus the lists of objects that appear in the user interface of data flow components, simplifying the interface and speeding up design.
Metadata Intelligence
In a large enterprise environment, you may have dozens if not hundreds of SSIS packages that run at various times to load myriad tables. In this environment, a new DBA might decide that it would be more effective to change the data type of a column from varchar into small integer. What would be the impact? No matter how small the change, the impact may extend to 15 or more packages. The SQL Server 2005 BI Metadata Samples Toolkit is a discovery tool that you can use to explore the impact of such changes.
There are several approaches that you can use when analyzing the metadata of a business intelligence system. One approach that you could use with Integration Services packages is to create an XPATH query to search the XML files that make up each package. The second approach, which is used in the SQL Server 2005 BI Metadata Samples Toolkit, is to load the package into memory by using the runtime engine and extract the properties by using the Integration Services API.
Metadata Intelligence Tools in SQL Server 2005
A team at Microsoft has developed a free set of utilities and samples called the SQL Server 2005 BI Metadata Samples Toolkit to help you solve the metadata challenge. You can use the Microsoft solution as is, or you can use the Toolkit as the basis for developing your own solution for impact analysis. The SQL Server 2005 BI Metadata Samples Toolkit includes the following items to help you create your own metadata solution:
DependencyAnalyzer.exe – Tool that evaluates and loads into a database the lineage information from SSIS packages, Analysis Services objects, and SQL Server databases. We'll show you how to specify which areas to scan in the table that discusses the various switches. All the source code for this program is provided.
DependencyViewer.exe – A tool that lets you graphically see the dependencies and lineage of objects in the lineage repository. Source code is provided for this program.
Data Source View – A data source view that connects to the lineage repository and that can be used by Reporting Services.
Lineage Repository – A database named SSIS_META, which is stored in SQL Server, and can be used to house metadata from almost any metadata management system.
Reports – Standard reports for impact analysis studies. The Toolkit contains two essential reports and several sub-reports.
Report Model – A report model that you can use with Report Builder to allow end-users to create ad hoc reports.
Integration Services Samples – Sample packages that you can audit or view lineage of objects.
The "Step by Step" section later in this paper provides detailed directions on how to use the tools in this Toolkit. Here we present an overview of the process and features.
First, you must create an empty database for the lineage repository and name the database SSIS_META. Next, you populate the database by running the script SSIS_META_creation.sql while the database is open in SQL Server Management Studio. After the SSIS_META database has been populated, you can then execute DependencyAnalyzer.exe at the command prompt to scan the local default instance. DependencyAnalyzer.exe has been made available in this Toolkit as a command-line tool (console application) so that you can schedule a SQL Server Agent job to run DependencyAnalyzer.exe automatically and scan the system nightly or weekly. DependencyAnalyzer.exe has been configured to run without switches set on the default instance of SQL Server, but you can pass a variety of switches to the program. The table lists the available command-line switches, which can also be viewed by typing DependencyAnalyzer.exe /? at the command line:
Table 5 Dependency Analyzer command-line switches
Switch |
Description |
---|---|
/depDb:<string> |
ADO.Net SqlConnection compatible connection string to dependency database. Default value: 'Server=localhost;database=SSIS_Meta;Integrated Security=SSPI;' (short form /d) |
/folders:<string> |
Root folders of file system which stores your SSIS packages. (short form /f) |
/isDbServer:<string> |
Instance of SQL Server where SSIS packages are stored. Default value:'localhost' (short form /i) |
/isDbUser:<string> |
SQL Server user who has access to stored SSIS packages. |
/isDbPwd:<string> |
SQL Server password of the user who has access to stored SSIS packages. |
/recurse[+|-] |
Indicates whether to recurse subfolders in the file system when enumerating objects. Default value:'+' (short form /r) |
/batchMode[+|-] |
Indicates whether to start execution without asking the user to continue. Default value:'-' (short form /b) |
/skipSQL[+|-] |
Indicates whether to skip enumerating packages that are stored in SQL Server. Default value:'-' (short form /s) |
/skipSSIS[+|-] |
Indicates whether to skip enumerating packages completely. Default value:'-' |
/asCon:<string> |
AMO compatible connection string to Analysis Services. Default value:'Provider=msolap;Data Source=localhost;' (short form /a) |
/skipAS[+|-] |
Indicates whether to skip enumerating Analysis Services objects. Default value:'-' |
@<file> |
Read response file for more options. |
After the metadata has been scanned, you can run reports against the SSIS_META database to see the impact of making a database change.
The SQL Server 2005 BI Metadata Samples Toolkit includes another tool that you can use to perform an impact analysis study. DependencyViewer is a stand-alone executable utility that helps you navigate through the dependencies. By using this tool, you can run a quick impact analysis and review the results graphically to determine which packages will be impacted by a change to a data source.
To view dependencies, open DependencyViewer, and then click Load to connect to the local repository. The tree view in the pane at left shows SSIS packages and other BI objects that are stored in the local repository. You can browse through the tree and select a SQL Server table to see the lineage, highlighted in blue. The lineage indicates where in the package the table is used. The example in the diagram is based on the Analysis Services cube sample that is provided with SQL Server 2005. In this case, you can see the analysis objects that would be affected by any change to the Product dimension.
When you click an object in the tree view, the description of the object (if one exists) is displayed in the Object Properties window. The object that you selected in the tree view in the left pane appears in the Lineage pane in brown. The objects to the left of that brown box were used to load the Product dimension table. The objects to the right of the brown box are the objects that would be affected if you changed the Product dimension. For example, the diagram shows that three tables — DimProduct, DimProductSubCategory, and DimProductCategory — were used as the source of the Product dimension. The Product dimension is then used in three measure groups in three cubes.
Figure 10 Dependency Viewer
The SQL Server 2005 BI Metadata Samples Toolkit also contains various reports for impact analysis and metadata, created using Reporting Services. A report model is also provided, which lets users create ad hoc reports by using Report Builder and a data source view.
The SSIS_META.dsv data source view (SSIS META.dsv) extrapolates the tables that are included with the Metadata Analyzer out to many virtual tables. Each table represented in the data source view is a named query. The queries are hierarchical, each targeting a different level of the package structure, with the Packages table at the top of the hierarchy.
You can use the SSIS_META.dsv file to simplify creating a report in Reporting Services. You won't have to know how to get to the underlying tables; instead, you can walk the hierarchy that is shown in Figure 11. The underlying tables and views are discussed in the next section.
The report model, provided in the file SSIS META.smdl, lets you create an ad hoc report by using Report Builder. The SSIS META.smdl report model is bound to the SSIS META.dsv, and the smdl file contains the definition of the hierarchy and table links that are shown in the screenshot. For more information about how to extract and use the smdl file, see the step-by-step guide in this white paper.
Figure 11 Metadata Analysis Sample Pack
Extending SQL Server 2005 BI Metadata Samples
To use the functionality in the Server 2005 Business Intelligence Metadata Samples Toolkit you will need to become familiar with the schema of the SSIS_META database. We recommend that you use the views that are provided in the data source view itself to do most of the table access, because the views give you a cleaner look into the system. The underlying table structure is flexible enough to load metadata from almost any tool, but to make the metadata more user-friendly, you can add new views to support custom objects. The following table lists the views, available in the SSIS_META database, that are provided in this Toolkit:
Table 6 Views included in Toolkit
View Name |
Description |
---|---|
Connections |
Shows all the connections. |
ConnectionsMapping |
Shows by ID which source connections map to which target connections. |
DataFlows |
Shows all the data flow tasks. |
LineageMap |
Shows each lineage object as a pair of IDs for the source and target objects. |
ObjectRelationships |
Shows the parent-child relationships between objects, by ID. |
Packages |
Lists all packages that have been scanned by DependencyAnalyzer. |
SourceTables |
Lists all tables that are defined as sources in a package. |
TableLineageMap |
Shows the mappings between source and target tables. |
TargetTables |
Lists tables that are defined as targets in a package. |
Although these views and the other tools in this Toolkit let you view the data, to extend the schema for support of modeling repositories or other ETL systems such as Ab Initio, you will have to know how to insert metadata into the back-end tables. The following table lists the tables in the SSIS_META database, and what each table is used for. You can then use these tables to extend the tool into areas that it does not currently analyze.
Table 7 List of tables in SSIS_METADATA
Table Name |
Description |
---|---|
Audit |
Contains a list of audited objects and the information that is captured about each object. Provides a template to help you build an audit trail of the metadata that you load. |
ObjectAttributes |
Contains name and value pairs for each attribute that is audited for a specific object: for example, information about a package includes the attribute names PackageGUID and GUID, together with their values. This information is used to create lineage maps. |
ObjectDependencies |
Contains a list of objects and the relationships between them. Used to create a hierarchy of metadata and to map sources to destinations. |
Objects |
Contains a list of objects scanned by Dependency Analyzer.exe. Can be extended to contain any object that you'd like to record lineage on: for example, a table, data flow task, or connection. |
ObjectTypes |
Lists the types of objects to analyze, including transformations, tasks, and dimensions. The initial types included in this table are controlled by the command-line switches of DependencyAnalyzer. |
One key point to remember is that the DependencyAnalyzer tool will delete all your records and refresh the tables each time it runs. Therefore, if you extend the tool by inserting your own data into these tables, you will need to develop some process to back up your data and reload it into the tables after the DependencyAnalzyer has run. You can also customize the DependencyAnalyzer tool to fit your needs using the source code that is provided.
If you extend the tool to scan additional metadata from applications such as Business Objects or ErStudio, you will need to create an additional enumerator in the Visual Studio project. You will also need to add your enumerator into the program.cs file. The following table lists the C# files in the DependencyAnalyzer tool that you can use as a model for these modifications.
Table 8 C# Files in DependencyAnalyzer
File |
Description |
---|---|
CommandLineArguments.cs |
Contains the command line switches that are used in Dependency Analyzer. |
FileEnumerator.cs |
Enumerates through the files that are used as sources or destinations in SSIS packages. |
NativeMethods.cs |
Contains the objects used to support COM interop for converting ProgIDs to CLSIDs. |
Program.cs |
Contains the core shared objects for the program that calls the enumerators. |
RelationalEnumerator.cs |
Enumerates through the relational database tables. |
Repository.cs |
Defines the objects in the SSIS_META database. |
SSASEnumerator.cs |
Enumerates through the SQL Server Analysis Services databases. |
SSISEnumerator.cs |
Enumerates through the Integration Services packages. |
Microsoft SQL Server 2005 BI Metadata Samples Toolkit: Step-by-Step
In this step-by-step guide to using the SQL Server 2005 BI Metadata Samples Toolkit, we assume that your instance of SQL Server contains at least one SSIS package. If you do not have an existing SSIS package, you can install the SSIS sample packages. If you selected to install the samples but cannot see them in SQL Server Management Studio (SSMS), you will need to import the sample packages into SSMS. By default, samples are located in the folder \Program Files\Microsoft SQL Server\90\Samples\Integration Services\Package Samples.
The following procedure explains how to download, extract, and open the SQL Server 2005 BI Metadata Samples Toolkit:
Download and run the SQL Server 2005 BI Metadata Samples Toolkit from MSDN. By default, the SQL Server 2005 BI Metadata Samples Toolkit is installed in the folder C:\Program Files\Microsoft SQL Server BI Metadata Samples.
In SQL Server Management Studio, connect to the instance of SQL Server that will hold your metadata repository and create a database named SSIS_META.
Open a new Query window and switch to the SSIS_META database. Open and execute the SSIS_META_creation.sql script. If you used the installation defaults, this script file can be found in C:\Program Files\Microsoft SQL Server BI Metadata Samples\SQL. This script creates the necessary objects in the SSIS_META database.
In Visual Studio 2005, open the solution file MetadataReportingSamples.sln, found in C:\Program Files\ Microsoft SQL Server BI Metadata Samples. You must have Visual Studio 2005 and Business Intelligence Development Studio installed to see all the projects inside this solution.
In Solution Explorer, right-click the DependencyAnalyzer and DependencyViewer projects and then click Build on each.
Open the folder C:\Program Files\Microsoft SQL Server BI Metadata Samples\DependencyAnalyzer\Bin\Debug (or \bin, depending on what mode you built the project in) and run DependencyAnalyzer.exe. If you use no arguments, both the database you scan and the SSIS_META database must be on the localhost instance.
To scan a database on another computer using Windows authentication, and save the results to the SSIS_META database on your computer, you would enter the following syntax at the command prompt:
Dependencyanalyzer.exe /depDb:"Server=MyDevServer;database=SSIS_Meta;Integrated Security=SSPI;" /isDbServer:MyProductionServer
After the metadata has been scanned, you can begin to report on the lineage. You have several choices for reporting. You can use the DependencyViewer tool, found in C:\Program Files\Microsoft\Microsoft SQL Server Metadata Samples\DependencyViewer\Bin directory, or you can create your own ad hoc Reporting Services reports by using the Report Builder. If you'd like to utilize the report model that is included in the BI MetaData Samples Toolkit, you will need to perform the following steps:
Open the Reporting Services project that is provided in the BI MetaData Samples Toolkit and deploy the report project to your Reporting Services server.
Connect to the Report Manager Web site and then click the Models folder (the default location of the project).
Click Report Builder to open the Microsoft Report Builder tool. If this is your first time running the application, it may take some time for the application to install.
Select SSIS_META as the source for your data when you create a new report.
Title the first example report "Packages with Metadata".
Figure 12 Creating the metadata report in Report Builder
Drag the Package entity from the left pane onto the column area and resize the PackageLocation column to fit inside the report.
Click Run Report to see a preview of the report. You can also click a package to see more details about the package.
If you're satisfied with the results, click Save to deploy the report to your Reporting Services server.
Figure 13 Previewing the metadata report
Providing the metadata report in Report Builder is an easy way to grant your users and developers ad hoc access to your metadata. Because the report model already contains the hierarchies, the user can navigate through the metadata and create a much more detailed report by using filters.
About The Authors
Mark Chaffin is the co-founder and managing partner for business intelligence with Florida Technology (www.floridatechnology.com), a leading provider of enterprise business intelligence solutions on the Microsoft platform. He has been the primary architect of many business intelligence solutions for clients in vertical markets including retail, consumer packaged goods, health care, finance, marketing, banking, technology, and sports and entertainment. He has experience in data mining, transactional application architecture, Internet application architecture, database administration, and database design. He is also the co-author of SQL Server 2000 Data Transformation Services, from Wrox Press, and has authored many articles on business intelligence, SQL Server, DTS, and Analysis Services. He is also a frequent speaker at Microsoft and SQL Server conferences, including PASS and TechEd.
Brian Knight, SQL Server MVP, MCSE, MCDBA, is the co-founder of SQLServerCentral.com and was recently on the Board of Directors for the Professional Association for SQL Server (PASS). He runs the local SQL Server users group in Jacksonville. Brian is a contributing columnist for SQL Server Standard and SQL Magazine and also maintains a weekly column for the database website SQLServerCentral.com. He is the author of Admin911: SQL Server (Osborne/McGraw-Hill Publishing) and co-author of Professional SQL Server DTS and Professional SQL Server 2005 SSIS (Wrox Press). Brian has spoken at conferences including PASS and SQL Connections.
Conclusion
For more information:
https://www.microsoft.com/technet/prodtechnol/sql/default.mspx
Download
Managing Metadata in SQL Server 2005
631 KB
Microsoft Word file